What files can you expect to find in a typical CPAN distribution? Why do they exist? What purpose do they serve?

This meditation is an answer to a blog post by Ovid but I hope it's generically useful. I'll keep updating this list with more info; please reply below if you can think of anything I've missed.

(Some of these might be automatically generated and others might be created manually; it all depends on your build process. This is not intended as a discussion of the best way to build distributions.)

What files are absolutely required?

None. There are no absolute requirements for the contents of a tarball uploaded to CPAN. However, the presence of certain files can be useful for the CPAN indexer, CPAN clients, search.cpan.org, metacpan.org and so on.

What files does the CPAN indexer want?

Although the CPAN indexer doesn't need either of these, it will handle them specially, unpacking them from the tarball so that people can download them without needing to download the whole distribution.

What files do CPAN clients want?

What files do CPAN websites want?

MetaCPAN will display the following documentation files on a release's front page if they exist:

XS stuff

These files are often found in XS packages:

Other files you might see...

Though not used by the CPAN indexer or clients, you will often see some other files left as artefacts of the system used to package the distribution:

Directories you might see...

What should be checked into version control?

My general rule is that if a file is automatically regenerated on a regular basis (e.g. META.yml is often automatically generated when packaging up the distribution) then it doesn't go into version control. Everything else does.

Update History

perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

Comment on What are the files in a CPAN distribution?
Select or Download Code
Re: What are the files in a CPAN distribution?
by Corion (Pope) on Dec 19, 2012 at 16:19 UTC
    ./MANIFEST.SKIP - many authors use this to list the files they have in their development directory which they don't want to package. Most distribution building processes will honour this list. They should really include MANIFEST.SKIP in their MANIFEST.SKIP so that it doesn't get distributed!

    I disagree with that point. Not having/knowing the MANIFEST.SKIP can make it highly tedious to rebuild a distribution. I think that the .gitignore and MANIFEST.SKIP and other such ephemeral parts of distribution building should be distributed so that other people can easily rebuild a distribution file if they are running their own inhouse CPAN mirror for example.

      OK, so you have a MANIFEST.SKIP file which lists "foo.txt". So somebody downloading the tarball knows that foo.txt should be skipped. But they don't have foo.txt, do they? foo.txt was never packaged.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

        But foo.txt may be generated by the build process or by the tests, and might be machine/environment specific and shouldn't be redistributed.

        My MANIFEST.SKIP files also list files used by source code managment as well as temporary files. These temp files might be generated during testing, running examples, etc. Why hand out additional work to another author if/when (s)he takes over the distribution or wants to put a modified/fixed version into a local darkpan repo?

        "I know what i'm doing! Look, what could possibly go wrong? All i have to pull this lever like so, and then press this button here like ArghhhhhaaAaAAAaaagraaaAAaa!!!"

      I agree with Corion here: I distribute MANIFEST.SKIP and think it's ok to distribute the /xt directory (or author-only tests). It just assures that if someone else wants to work on the module too, they've got everything that I would use if it were me working on it. It's easy enough to ensure that those ./xt tests only run if an environment variable is set, so at worst they're a little extra disk space, and at best they might help another contributor, bug-fixer, etc.


      Dave

        Most modules are no in some sort of reproducable VCS: svn or git, and most have some sort of preferred way of contributions (github).

        If people want to work on a module/distribution, a git clone or svn copy will get you *all* files, including those that - I agree with tobyink here - are listed in .gitignore and MANIFEST.SKIP. I hope at least that the authors are sane enough to have these two in the repository anyway.


        Enjoy, Have FUN! H.Merijn

        and think it's ok to distribute the /xt directory (or author-only tests)

        I agree. Even if the tests are never run by someone else, they might still prove a valuable resource to another author. And if (somewhat unexpectedly cause my code is always perfect) bug reports start flooding in, it is always a good thing to have all existing tests for the distribution already deployed. It's much easier to ask a user to "just run this command(s) and post the output" than first having to haggle about how to send the test scripts and set up the test environment.

        "I know what i'm doing! Look, what could possibly go wrong? All i have to pull this lever like so, and then press this button here like ArghhhhhaaAaAAAaaagraaaAAaa!!!"
      Not having/knowing the MANIFEST.SKIP can make it highly tedious to rebuild a distribution.

      +1

      The only Perl archive I trust for the long term is the CPAN. Every distribution should be rebuildable from itself, perl, and other CPAN distros.

      If an author abandons his module, we want to be able to continue his work from the CPAN data, not relying on an other source that may have disappeared with the maintainer.

        My general position on this is that the CPAN distribution is not the development tree; the CPAN distribution is a product of the development tree.

        My repo may contain a bunch of extra files (author tests, Devel::Cover output, benchmarking scripts, etc) which are not included in the distro. And similarly the distribution will contain a bunch of auto-generated files (like META.yml, Changes, LICENSE, etc) that I have no desire to check into my repo.

        However if I go under a bus tomorrow and somebody else needs to pick up the distro, they don't actually need any of that stuff. They could just download the tarball, extract the files, make a few changes, bump the version numbers, and then use tar to create a new tarball, and upload it.

        OK, that might seem like a pain for them not to be able to use the author tests, etc I have in my repo, but actually if you think about it, it makes it easier for them to transition the distribution to their own build tools. They can start working on the distribution the way they like to work on software, not the way I like to work on software. (Especially considering I'm somewhat of an oddball in my development style!)

        perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: What are the files in a CPAN distribution?
by Tommy (Chaplain) on Jan 01, 2013 at 02:03 UTC

    Tobyink, Thanks for this Perl Meditation. It's really useful, as were several of the comments.

    Corion, do you think a .gitignore file is sufficient, or do you feel that a MANIFEST.SKIP is the only "right" way?

    --
    Tommy
    $ perl -MMIME::Base64 -e 'print decode_base64 "YWNlQHRvbW15YnV0bGVyLm1lCg=="'

      .gitignore and MANIFEST.SKIP serve different purposes.

      • MANIFEST.SKIP tells the distribution building process (as far as I know EU:MM, M:B and M:I all support it) to skip including certain files in the packaged distribution.

      • .gitignore (and equivalents for other versioning systems: .hgignore files or svn:ignore properties) tell your version control system to keep particular files out of version control.

      It's quite common to have files that are not in version control. but are packaged with the distribution, such as META.yml which might be built on the fly at packaging time. Conversely there are occasions in which you'd want to keep files in your repository but not distribute them - a Devel::Cover database, editor config files (.vimrc), benchmarking scripts, etc. So it's fairly usual to have significant differences between the .gitignore and MANIFEST.SKIP lists.

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

        Thank you. I can see what you mean. It didn't occur to me I guess, because I haven't ever kept anything in my dev tree that didn't go into the package; I keep such things higher up in the tree than the directory that gets built.

        Now you've got me thinking... again.

        --
        Tommy
Re: What are the files in a CPAN distribution?
by arrestee (Novice) on Sep 17, 2013 at 18:30 UTC

    Since this is intended for newbs like me, you might want to point out that when one uses the MetaCPAN search function, the (first, main, most important-looking) link provided is to the "module" page, not the "release" page. The module page doesn't show most of the files you say MetaCPAN shows, which left me scratching my head for a while. Finally I spotted the link to the release page. That made it easier to understand this post, and to use existing distributions as helpful examples of how things are done.