http://www.perlmonks.org?node_id=699449

nathanhaigh has asked for the wisdom of the Perl Monks concerning the following question:

I'm writing for advice/guidelines on splitting large distributions of Perl modules into smaller sets with dependencies on each other.

In particular I'm referring to Bioperl (www.bioperl.org): http://search.cpan.org/~sendu/bioperl-1.5.2_102/

First ignoring the complexity of the dependencies between every module and how complex it may be to maintain for the Bioperl devs, I have a few questions about splitting up Bioperl into bite-sized chunks:

Is there any reason why Bioperl couldn't/shouldn't be split into separate modules on CPAN each with their correct dependencies to the other Bioperl modules? Then use bundles to group together more overarching modules into larger units consisting of modules for performing a set of tasks? In other words, is there any reason why it shouldn't be split down to the module level?

Bioperl has become very large and as a result, for anyone developing Perl code/applications/scripts that would like to make use of a small number of Bioperl modules, we currently have to depend on the WHOLE set of Bioperl modules rather than the few we need - a huge overhead to ask someone to install all Bioperl in order to run your script/application etc.

Thanks for any advice you may be able to give, I'll report back to the Bioperl devs with responses.

Thanks,
Nathan

-------------------------------------------------------------
Dr. Nathan S. Watson-Haigh        (publish under Haigh, N.S.)
OCE Post Doctoral Fellow
CSIRO Livestock Industries
J M Rendel Laboratory
Rockhampton
QLD 4701                              Tel: +61 (0)7 4923 8121
Australia                             Fax: +61 (0)7 4923 8222
-------------------------------------------------------------
  • Comment on Advice on Splitting Large Distributions in CPAN

Replies are listed 'Best First'.
Re: Advice on Splitting Large Distributions in CPAN
by Khen1950fx (Canon) on Jul 23, 2008 at 00:05 UTC
    By all means, use bundles. Here's a bundle that I put together for a "basic bioperl":

    #!/usr/bin/perl use strict; use warnings; use CPAN; our $FORCE; CPAN::Shell->force('install', "Bundle::BioPerl", "Bio::Seq", "Bio::SeqIO::staden::read", "Bio::Factory::EMBOSS", "Bio::Tk::SeqCanvas", "Bio::DB::Annotation");

    This covers a lot of ground, but it's enough to get up and running with bioperl.

      I think you misunderstand. Bioperl consists of 10's in not hundreds of modules, as new modules are developed they are simply added to the growing list of modules available under a single CPAN package. This makes it difficult for a third party to reuse Bioperl code in their own scripts/applications as it would require the whole Bioperl package to be installed as a prerequisite. Therefore the devs are looking to split up Bioperl into smaller chunks with dependencies correctly setup between these smaller chunks. Therefore if someone wants to be able to read/write sequence files from their own script, they will then only need to depend on a smaller subset of the Bioperl modules i.e. Bio::SeqIO::*

      The question is, how small can these chunks be? Is there any limitation or is it simply a maintainence issue for the devs if they split things up too much?

      After the split, several bundles could be made to organise the smaller subsets of modules into larger bundles with similar functions.

        There is no limitation. It is all a maintenance issue.

        However if the cross-dependencies mean that a group of modules is realistically always installed together, then there is absolutely no real benefit to splitting them.

Re: Advice on Splitting Large Distributions in CPAN
by samtregar (Abbot) on Jul 22, 2008 at 23:06 UTC
    First ignoring the complexity of the dependencies between every module and how complex it may be to maintain for the Bioperl devs...

    I can't ignore that - it's the biggest problem! Dividing up the modules along functional lines is probably a win for users that just need a little piece but it will definitely cause problems for the developers.

    If the developers are willing, sure, there's no reason they couldn't create 20 distributions instead of one, with a spider's web of dependencies carefully setup to link them together as needed. They could also just start mailing you free beer when you get thirsty! Might as well ask for that while you're at it.

    -sam

      The reason I said "first ignoring..." was that there may be a misconception by the Bioperl devs that splitting a distribution up into lots of little pieces may be frowned upon by the CPAN people. Therefore if this is not the case, then the decision becomes a trade off between ease of maintainance and ease of installation/re-usability for the end user.

      Discussions are currently underway to split Bioperl into smaller chunks, and we need all the facts to help decide at what level it is best to split it.

      Thanks for your quick feedback,
      Nathan