Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

What are the files in a CPAN distribution?

by tobyink (Abbot)
on Dec 19, 2012 at 16:13 UTC ( #1009586=perlmeditation: print w/ replies, xml ) Need Help??

What files can you expect to find in a typical CPAN distribution? Why do they exist? What purpose do they serve?

This meditation is an answer to a blog post by Ovid but I hope it's generically useful. I'll keep updating this list with more info; please reply below if you can think of anything I've missed.

(Some of these might be automatically generated and others might be created manually; it all depends on your build process. This is not intended as a discussion of the best way to build distributions.)

What files are absolutely required?

None. There are no absolute requirements for the contents of a tarball uploaded to CPAN. However, the presence of certain files can be useful for the CPAN indexer, CPAN clients, search.cpan.org, metacpan.org and so on.

What files does the CPAN indexer want?

Although the CPAN indexer doesn't need either of these, it will handle them specially, unpacking them from the tarball so that people can download them without needing to download the whole distribution.

  • ./README - a plain text file containing general information about the distribution. Many people auto-generate this from the pod of the main module in their distribution, but you don't need to.

  • ./META.yml or ./META.json - metadata about the distribution, usually including authorship, licence, prerequisites and so on. See CPAN::Meta::Spec. Why have one file format when you can have two?

What files do CPAN clients want?

  • ./MANIFEST - a list (one per line) of files contained within the tarball, optionally including comments. See ExtUtils::Manifest. Not actually required, but many CPAN clients will issue massive warnings if it's missing or incorrect.

  • ./Makefile.PL or ./Build.PL. If Build.PL is present, and the CPAN client is modern enough, the client will use the following commands to install the distribution:

    perl Build.PL ./Build ./Build test ./Build install

    The idea is that Build.PL is a script using Module::Build and that Build is another Perl script output by Build.PL. But of course, it doesn't have to be; Build.PL could be an empty script, and Build could be a shell script.

    If Build.PL is missing, or the client doesn't support Build.PL, the following steps are run:

    perl Makefile.PL make make test make install

    You should include at least one of these if you hope for people to be able to install your module via CPAN clients.

  • ./META.yml or ./META.json - CPAN clients will use these to determine build-time prerequisites for your module. Recent Makefile.PL or Build.PL files will also typically output ./MYMETA.yml and ./MYMETA.json for the canonical list of run-time prerequisites (and these should not be packaged in the distributed tarball), but if they do not, then ./META.yml or ./META.json are used as fallbacks.

  • ./SIGNATURE - a list of MD5 hashes of the files in the MANIFEST, signed by GPG/PGP. Optional, but some CPAN clients will offer to verify the signature and check the MD5 hashes.

What files do CPAN websites want?

MetaCPAN will display the following documentation files on a release's front page if they exist:

  • ./README / ./README.md / ./README.pod
  • ./INSTALL
  • ./NEWS
  • ./FAQ
  • ./LICENSE / ./Copying
  • ./COPYRIGHT
  • ./TODO / ./ToDo / ./Todo
  • ./CONTRIBUTING
  • ./AUTHORS / ./CREDITS / ./THANKS
  • ./CHANGES / ./Changes / ./ChangeLog / ./Changelog / ./CHANGELOG (see CPAN::Changes::Spec)
  • It will similarly display (though they're not strictly documentation): ./MANIFEST, ./Makefile.PL, ./Build.PL, ./META.yml, ./META.json, dist.ini (see Dist::Zilla) and ./cpanfile (see Module::CPANfile).

XS stuff

These files are often found in XS packages:

  • ./ppport.h - C header file that many XS packages link against.
  • ./typemap - mapping between Perl and native C data types.

Other files you might see...

Though not used by the CPAN indexer or clients, you will often see some other files left as artefacts of the system used to package the distribution:

  • ./MANIFEST.SKIP - many authors use this to list the files they have in their development directory which they don't want to package. Most distribution building processes will honour this list. They should really include MANIFEST.SKIP in their MANIFEST.SKIP so that it doesn't get distributed!

  • ./dist.ini - config file for the popular distribution builder Dist::Zilla.

Directories you might see...

  • ./lib/ - by convention, packaged modules are distributed here.

  • ./bin/ or ./script/ - by convention, packaged scripts are distributed here.

  • ./t/ - by convention, packaged test cases are distributed here.

  • ./xt/ - additional test cases run by the author only; there's no need to upload these to CPAN; they should be in MANIFEST.SKIP.

  • ./inc/ - bundled support files for Makefile.PL.

  • ./share/ - conventional place to distribute data files accompanying the distribution

  • ./examples/ or ./eg/ - for example scripts, sample input and sample output. As of 2013-03-19, MetaCPAN displays a list of example files on a release's front page.

  • ./meta/ - used by Module::Package::RDF as the source of distribution metadata

What should be checked into version control?

My general rule is that if a file is automatically regenerated on a regular basis (e.g. META.yml is often automatically generated when packaging up the distribution) then it doesn't go into version control. Everything else does.

Update History

  • 2013-03-20: added info about ./examples/, ./cpanfile and ./CONTRIBUTING.
perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

Comment on What are the files in a CPAN distribution?
Select or Download Code
Re: What are the files in a CPAN distribution?
by Corion (Pope) on Dec 19, 2012 at 16:19 UTC
    ./MANIFEST.SKIP - many authors use this to list the files they have in their development directory which they don't want to package. Most distribution building processes will honour this list. They should really include MANIFEST.SKIP in their MANIFEST.SKIP so that it doesn't get distributed!

    I disagree with that point. Not having/knowing the MANIFEST.SKIP can make it highly tedious to rebuild a distribution. I think that the .gitignore and MANIFEST.SKIP and other such ephemeral parts of distribution building should be distributed so that other people can easily rebuild a distribution file if they are running their own inhouse CPAN mirror for example.

      OK, so you have a MANIFEST.SKIP file which lists "foo.txt". So somebody downloading the tarball knows that foo.txt should be skipped. But they don't have foo.txt, do they? foo.txt was never packaged.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

        But foo.txt may be generated by the build process or by the tests, and might be machine/environment specific and shouldn't be redistributed.

        My MANIFEST.SKIP files also list files used by source code managment as well as temporary files. These temp files might be generated during testing, running examples, etc. Why hand out additional work to another author if/when (s)he takes over the distribution or wants to put a modified/fixed version into a local darkpan repo?

        "I know what i'm doing! Look, what could possibly go wrong? All i have to pull this lever like so, and then press this button here like ArghhhhhaaAaAAAaaagraaaAAaa!!!"

      I agree with Corion here: I distribute MANIFEST.SKIP and think it's ok to distribute the /xt directory (or author-only tests). It just assures that if someone else wants to work on the module too, they've got everything that I would use if it were me working on it. It's easy enough to ensure that those ./xt tests only run if an environment variable is set, so at worst they're a little extra disk space, and at best they might help another contributor, bug-fixer, etc.


      Dave

        Most modules are no in some sort of reproducable VCS: svn or git, and most have some sort of preferred way of contributions (github).

        If people want to work on a module/distribution, a git clone or svn copy will get you *all* files, including those that - I agree with tobyink here - are listed in .gitignore and MANIFEST.SKIP. I hope at least that the authors are sane enough to have these two in the repository anyway.


        Enjoy, Have FUN! H.Merijn

        and think it's ok to distribute the /xt directory (or author-only tests)

        I agree. Even if the tests are never run by someone else, they might still prove a valuable resource to another author. And if (somewhat unexpectedly cause my code is always perfect) bug reports start flooding in, it is always a good thing to have all existing tests for the distribution already deployed. It's much easier to ask a user to "just run this command(s) and post the output" than first having to haggle about how to send the test scripts and set up the test environment.

        "I know what i'm doing! Look, what could possibly go wrong? All i have to pull this lever like so, and then press this button here like ArghhhhhaaAaAAAaaagraaaAAaa!!!"
      Not having/knowing the MANIFEST.SKIP can make it highly tedious to rebuild a distribution.

      +1

      The only Perl archive I trust for the long term is the CPAN. Every distribution should be rebuildable from itself, perl, and other CPAN distros.

      If an author abandons his module, we want to be able to continue his work from the CPAN data, not relying on an other source that may have disappeared with the maintainer.

        My general position on this is that the CPAN distribution is not the development tree; the CPAN distribution is a product of the development tree.

        My repo may contain a bunch of extra files (author tests, Devel::Cover output, benchmarking scripts, etc) which are not included in the distro. And similarly the distribution will contain a bunch of auto-generated files (like META.yml, Changes, LICENSE, etc) that I have no desire to check into my repo.

        However if I go under a bus tomorrow and somebody else needs to pick up the distro, they don't actually need any of that stuff. They could just download the tarball, extract the files, make a few changes, bump the version numbers, and then use tar to create a new tarball, and upload it.

        OK, that might seem like a pain for them not to be able to use the author tests, etc I have in my repo, but actually if you think about it, it makes it easier for them to transition the distribution to their own build tools. They can start working on the distribution the way they like to work on software, not the way I like to work on software. (Especially considering I'm somewhat of an oddball in my development style!)

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: What are the files in a CPAN distribution?
by Tommy (Chaplain) on Jan 01, 2013 at 02:03 UTC

    Tobyink, Thanks for this Perl Meditation. It's really useful, as were several of the comments.

    Corion, do you think a .gitignore file is sufficient, or do you feel that a MANIFEST.SKIP is the only "right" way?

    --
    Tommy
    $ perl -MMIME::Base64 -e 'print decode_base64 "YWNlQHRvbW15YnV0bGVyLm1lCg=="'

      .gitignore and MANIFEST.SKIP serve different purposes.

      • MANIFEST.SKIP tells the distribution building process (as far as I know EU:MM, M:B and M:I all support it) to skip including certain files in the packaged distribution.

      • .gitignore (and equivalents for other versioning systems: .hgignore files or svn:ignore properties) tell your version control system to keep particular files out of version control.

      It's quite common to have files that are not in version control. but are packaged with the distribution, such as META.yml which might be built on the fly at packaging time. Conversely there are occasions in which you'd want to keep files in your repository but not distribute them - a Devel::Cover database, editor config files (.vimrc), benchmarking scripts, etc. So it's fairly usual to have significant differences between the .gitignore and MANIFEST.SKIP lists.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

        Thank you. I can see what you mean. It didn't occur to me I guess, because I haven't ever kept anything in my dev tree that didn't go into the package; I keep such things higher up in the tree than the directory that gets built.

        Now you've got me thinking... again.

        --
        Tommy
Re: What are the files in a CPAN distribution?
by arrestee (Novice) on Sep 17, 2013 at 18:30 UTC

    Since this is intended for newbs like me, you might want to point out that when one uses the MetaCPAN search function, the (first, main, most important-looking) link provided is to the "module" page, not the "release" page. The module page doesn't show most of the files you say MetaCPAN shows, which left me scratching my head for a while. Finally I spotted the link to the release page. That made it easier to understand this post, and to use existing distributions as helpful examples of how things are done.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://1009586]
Approved by Corion
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (9)
As of 2014-12-26 20:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls