|Problems? Is your data what you think it is?|
The last day or two I've been learning how to prepare a module for CPAN distribution. There doesn't seem to be a good tutorial anywhere explaining how this all works. The two tutorials I found on Perl Monks are quite old. One was written in 2002 and the other in 2005. Neither discusses newer tools like Module::Build. Nor do they explain how CPAN works, let alone discuss how this process can be safely and reliably customized.
To answer that question, one needs to know (a) what happens only on the developer's machine, (b) what happens on the target machine and (c) how CPAN itself uses the data in a distribution package.
There is no shortage of information, of course, but it all seems to be scattered here and there. I thought I'd write up my learnings and practical observations in case they would help others. Once one becomes familiar with something it is all too easy to forget what was hard.
I'd very much appreciate feedback from experienced CPAN developers. Are their practical tips I've missed? Did I form a misimpression or jump to a false conclusion about something because of my brief experience?
Many thanks in advance for feedback. -- beth
1. CPAN Distribution - Technical Overview
1.1 Contents of a CPAN package
A CPAN package is a tarball that is expected to have the following contents.
1.2 Downloading and installing using a CPAN client
When a distribution file is downloaded from CPAN, the installation process includes seven steps:
Steps 1&2 are handled by a CPAN client. Steps 3-6 are handled by either ./Build test or make test. The final seventh step is handled by either ./Build install or make install.
To get a sense of how to install a CPAN tarball without benefit of a CPAN client, see perlmodinstall.
1.3 Build.PL vs. Makefile.PL
Build.PLgenerates a Perl script named "Build" but only works with newer versions of the CPAN client. Makefile.PL generates a make file that can be used even with older versions of the client. However, it is less portable because it assumes that make (or some related tool like nmake/dmake) is installed.
To complete the installation using the makefile generated by Makefile.PL, CPAN runs the commands make test and make install. This means, of course, that the installation process will fail if the new machine doesn't have make installed. This is one of the reason why newer versions of CPAN use Build.PL if available. Since it is a Perl script it can run on any machine where Perl is installed. No third party software is needed. Some systems, like Microsoft Windows, do not have make installed as a matter of course.
Even on systems that do have make, make's use of the command shell can cause problems. Each operating system has a different preferred implementation of the command shell: C, Korn, Bourne, Bash, Ash, to name a few. There are subtle syntax differences between these shells and it is quite possible that a make file that works well on one flavor of Linux/Unix will fail on another because it relies on a different flavor of Linux/Unix shell.
1.4 Choosing packaging tools
These files are not magic. Both the Perl Build script and the make file can contain any instructions immaginable as long as they know how to understand the commands 'test' and 'install'. Thus the Perl script generated by Build.PL must be able to called like this: ./Build test and ./Build install. The generated make file must support make test and make install.
However, handcrafting the meta files (META.json, META.yml) and writing a build script/make file generator requires a great deal of domain knowledge. Most developers therefore rely on one of four main tool kits to package up their modules:
2. Building a module with Module::Build
2.1 Arranging your files
Module::Build expects that you will be developing your code in a project directory that looks like this:
The directories listed above should contain only the files that belong to your project. Module::Build doesn't have a good way of extracting files from a single common source tree shared by multiple projects. It assumes that all files in the lib directory belong in your project unless you specifically exclude them via a regular expression in the MANIFEST.SKIP file.
It is also essential that .pl files be placed in scripts/ and not lib/. When Module::Build sees .PL (or .pl in a case insensitive system) in the lib/ directory, it assumes that the file is meant to generate a module rather than be used as a script. It will run the script and put the output of the script into a file that has the same name as the script file, less the .PL suffix. Thus lib/foobar.pm.PL would be expected to generate lib/foobar.pm.
For portability reasons, each module name component should be 11 or fewer charaters. The first 8 of these must be different from any other module on CPAN. This ensures that the module will behave well on operating systems that have a very short file names.
The PAUSE documents recommend informative names over "cool" or poetic names. For more information, see the following links:
If you use an alternate organization for your projects
If you have an alternate arrangement of files, for example, storing all source code in a common tree rather than in per-project directories, you will have to move the files into place before beginning the build process. There are ways to automate this proces, but it requires subclassing Module::Build and adding an extra action, called 'makeproject' or 'import'.
2.2 Writing Build.PL
Build.PL is a file you write. At a minimum it contains three basic instructions: (a) loading Module::Build or a subclass (b) initializing a new builder object with project specific property values and (c) generating a Perl script named "Build".
You can get a full list of parameters to pass new in Module::Build::API.
Making your Build.PL file CPAN testers friendly
The page http://wiki.cpantesters.org/wiki/CPANAuthorNotes has some helpful pointers for making it easier for CPAN testers to work with your distribution. The key points related to Build.PL and Makefile.PL are:
Distribution version numbers
The dist_version property identifies the version number for your distribution package. All distributions MUST have a version number.
If you omit the dist_version property number, Perl will try to guess the version number by looking for a variable named $VERSION in the 'module_name' module. For the example above, had 'dist_version' been omitted, Module::Build would have looked for $VERSION in 'lib/Exception/Lite.pm'
The version number is an especially important parameter because CPAN uses it to track distribution files. It consists of three components: a major number indicating a collection of binary compatible releases; a 3 digit minor version number indicating feature enhancements within that binary compatible group, and a 3 digit patch or development release number.
If the third component is preceded by '_', CPAN counts the upload as a development release. The intended features for the minor version may be partially implemented as well. Thus '0.099_001' would be the first development release for feature set '0.999'. It is meant to be available for testing but not as a published download.
This intention is enforced softly. The CPAN distribution page marks it with a label in big red letters saying "DEVELOPEMENT RELEASE". CPAN clients are encourged not to install it as the default version even if its version number is higher than any others. They should be downloaded only if the user requests that specific version, presumably for testing purposes.
If the patch number is preceded by a '.' then it will be published and available for downloading via CPAN. For more information, see Perl::Version.
No two uploads may have exactly the same version number. If you mess up and need to reupload a distribution file, you must change the patch or development release number.
Configuring documentation generation
Unfortunately, there don't seem to be many options to control this process. For HTML generation there is only one user definable option: html_css: my $oBuilder = Module::Build->new (....); $oBuilder->html_css('MyLayout.css');
Another related issue concerns the content of pod files. The syntax and handling of the L<link_descriptor> has changed over time. Two changes in particular may cause problems:
Output of Build.PL
The script generated by this simple file contains a number of default commands. In addition to the test and install commands, there are several that are generally used only by developers preparing their code for packaging.
For a list of commands, see Module::Build
Advanced Build.PL files
You can also have much more elaborate scripts for generating Build.PL. This one subclasses Module::Build on the fly and adds a routine that imports project files from a single codebase source tree. The routine is very simple and would benefit from many improvements (portable path name construction, checking for deleted files, validating the copy). It is meant only for illustration purposes:
Building a subclass with Module::Build->subclass(code=>...) is only practical for very short snippets of code. Code defined via the code property is compiled without benefit of strict or warnings so it is especially easy for variable name mispellings to slip through. Also syntax highlighting doesn't necessarily work in here documents (on Xemacs it all gets colored as a string) so the probability of mistakes is increased even further.
If you do choose to use Module::Build->subclass(code=>...), everything you plan to use must be placed within the here document assigned to the code property. The Build.PL file and code that is part of it is never used after Build.PL runs. In fact the code snippet that you define in the here document is simply used to generate a subclass definition file that is placed in the _build directory. Anything outside of that snippet will never make it into the generated subclass file. That is why you cannot do something like this in your Build.PL file:
If you need to define extensive amounts of code you are better off defining your specialist code in a dedicated subclass file and placing that file in the inc directory of your project directory. See Module::Build::Authoring for more information.
2.3 Running the Build.PL Command
As a developer there are two reasons you will want to run the Build.PL command. First, the generated Build file defines many commands that are useful to developers. Second, you will want to test your installation process and generating Build from Build.PL is part of that installation process.
To generate Build you simply type perl Build.PL in the top level of the project directory.
The Build.PL command must be run from the top level of the project directory. The script generation routines in Module::Build simply assumes that "lib/", "inc/", etc are in the current directory where the script was launched. It will complain about not being able to find modules if run from any other directory.
Generating both a build script and makefile
If you want to generate both the build script and the makefile your Build.PL file can set the create_makefile_pl property in the parameter list to Module::Build->new(...).
Setting this parameter is the easiest way to generate a makefile and it will work for most simple installations. However, if your installation process is complex, you may need to take more control over this process. For details, see Module::Build::Compat and Module::Build::API's documentation on the create_makefile_pl parameter.
Deleting the generated script and starting over
Running Build.PL adds two items to the top level of the project directory:
You can completely remove the Build script and the _build directory, by running the command ./Build realclean. The name of this action is a bit of a misnomer. It always removes the build script and the _build/ directory. It sometimes removes the blib/ directory, the distribution staging area, and temporary files produced during the html generation process. What determines when things are removed and when they are not is not at all clear.
It appears to never remove the following files:
If you want to regenerate these from scratch, you must manually remove them.
2.4 Packaging up your module for distribution
To package your module you must run the following commands in sequence:
The build script generated by Build.PL does not accept more than one action at a time so you can't combine the commands into one single action, such as "./Build manifest disttest dist". Only the first command will be run.
2.5 Additional testing options
The disttest routine only verifies that the module has the files needed to upload the module to CPAN, download it and run its tests. To make sure your module installs properly you will need to run additional tests. Additional testing may also be required to make sure that the released code fits your quality control standards.
Emulating what happens after the tarball is unpacked
To emulate what happens after the tarball is unpacked, you can run the following sets of commands:
The first set of commands builds blib/ as normal, tests the files and generates documentation as normal. However, instead of copying the files to their final destination it merely reports on what files it would have copied and to which locations.
The second set of commands does an actual fake installation to a directory other than the normal site directory. In this case the files are installed to /tmp/foo. You can verify this by running ./Build fakeinstall. Instead of the normal site locations, the copy destinations will all be in /tmp/foo/.
Please note that the second method requires rebuilding the Build script. The destination directory is hard coded into the script and there is no option for changing the destination directory on the build script itself.
To clean out generated files and start all over you can use. In theory this should clean out the blib/ directory generated by the 'test' action. It is best to double check that the file was in fact removed. For some reason, from time to time, the "blib/" directory won't go away even when this command is run.
Testing installation on systems other than your own.
There is very limited support for this. If you want to test the generatio of documentation that would not normally be generated on your system you can use the following two commands:
You can control the locations where files will be installed by using the --install_path and --installdirs options. See Module::Build for details.
However, this only begins to touch on the portability issues that can affect a module. By far and away the best option is to get your module working well on your own system and then upload it to CPAN where users of other systems can download and test it. See CPAN Author Notes for more information.
Quality control testing
Module::Build's generated Build script also contains several tools for checking the quality of code, tests, and documentation. Among them:
If you are particular excited about quality metrics you might also want to consider using the Module::Build::Kwalitee subclass of Module::Build. For a description of the Kwalitee metrics and why they are important, see http://cpants.perl.org/kwalitee.html. Kwalitee metrics are tracked by CPANTS, an alternate testing service that should not be confused with CPAN testers.
2.6 Extensions to Module::Build
Module::Build was designed for subclassing and fortunately many developers have taken advantage of that and shared their work.
A number of extensions to Module::Build have been created to handle special application types: applications with embedded C/C++, applications with databases, applications with a web front end and so on. For a list of available modules, search CPAN.
3. Uploading your package to CPAN
4. Alternative distribution channels
The Build script generated by Module::Build also supports packaging for software distribution channels other than CPAN: