http://www.perlmonks.org?node_id=696604

hangon has asked for the wisdom of the Perl Monks concerning the following question:

A recent node reminded me of a problem I had a while back - how to include some sort of periodically changing data into a module. These are some possibilities I came up with, though I'm sure there are others:

  1. Incorporate the data directly into the module as a data structure, and just update the module when the data changes.
  2. Same, but incorporate it in a __DATA__ section of the module.
  3. Inherit from a separate data module.
  4. Use a separate config data file (text, yaml etc) for the module in @INC. Provide class method to update it.
  5. Provide methods to import and update the data in a config file local to the calling program.

My solution was #1 just for expedience (though it didn't feel quite right), however, the data was from the PNG specification, which shouldn't be expected to change often. Other data such as a list of PPM repositories may be more of a moving target. Especially for a CPAN module, it would seem useful to provide a method to update the data rather waiting for the module author or hacking the module yourself.

I was wondering what others have done, and what might be considered better ways of handling data for modules such as PPM::Repositories, where it's likely that the user would want or need to update the module's data on their own.

  • Comment on Including periodically changing data with modules

Replies are listed 'Best First'.
Re: Including periodically changing data with modules
by dragonchild (Archbishop) on Jul 10, 2008 at 03:56 UTC
    A lot depends on what the data is. Number::Phone uses a DBM::Deep DB in the __DATA__ section to accomplish this. Personally, I prefer separate files. This way, I can work with multiple versions of the data in question.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Including periodically changing data with modules
by pc88mxer (Vicar) on Jul 10, 2008 at 04:16 UTC
    I would have your module rely on a separate data provider object to return the data. That way you can encapsulate all of the policies about how the data is stored, versioned, updated, etc. in a user selectable class. Of course, you can define convenience methods which use some default data provider, but you gain the most flexibility by handing off that responsibility to another object and it also is a nature separation of concerns.
      Data::Package
      How awkward. I'm playing around with doing this exact thing right now to exclude the translation stuff from Date::Manip.


      Evan Carroll
      I hack for the ladies.
      www.EvanCarroll.com
Re: Including periodically changing data with modules
by snoopy (Curate) on Jul 10, 2008 at 07:47 UTC
    I've got into the habit of using File::ShareDir along with the auto directory tree. This is useful for storing any supplementary read-only files such as templates or data files.
    1. Deposit the data files in the corresponding auto directory in the Perl library path. For example reference files for lib/My/Module.pm are placed in directory lib/auto/My/Module.
    2. Use File::ShareDir to locate the file at run-time:
    package My::Module; use warnings; use strict; use File::ShareDir; our $data_path = File::ShareDir::module_file(__PACKAGE__, 'mydata.yaml +');
    By convention loaders and packagers look for and automatically include resources in auto in the module path. So both will be packaged and installed together.
      This didn't work for me at all, I'm finding this module to be more trouble than it is worth. Good idea, horrible docs and only slightly less complex than the task itself.
      This nonsensery comes in two components, neither makes sense. One of them wants your stuff to be in ./lib/auto, (the loader), while the installer wants it to be in ./share. Use both of the defaults, and "make test" won't work. It moves the files to ./blib/lib/auto, but it doesn't set up the testing environment to use it I suppose. The loader isn't of much help either, cloaking the full location of the failed open -- or whatever it is failing at.
      t/date_misc_b...........File 'y/XLang/English.yaml' does not exist in +module dir at Date-Manip-5.54/blib/lib/Date/Manip.pm line 353
      I mean come on, where the fuck is the module dir. I've tried setting it to the module name (__PACKAGE__) and giving it the dist-name, neither work. It works outside using a "prove -l ./t/*.t"

      # Compile-testing for PITA::Report use lib (); use File::Spec::Functions ':ALL'; BEGIN { $| = 1; unless ( $ENV{HARNESS_ACTIVE} ) { require FindBin; $FindBin::Bin = $FindBin::Bin; # Avoid a warning chdir catdir( $FindBin::Bin, updir() ); lib->import( catdir('blib', 'lib'), catdir('blib', 'arch'), 'lib', ); }

      So you have to code like that above every test script? eww.

      really this module just needs some non-shitty docs, it isn't half bad


      Evan Carroll
      I hack for the ladies.
      www.EvanCarroll.com
Re: Including periodically changing data with modules
by roboticus (Chancellor) on Jul 10, 2008 at 12:36 UTC
    hangon:

    Keeping in mind that laziness is a virtue, for the type of data I think you're talking about (description of standard data structures that change infrequently, if at all), I think I'd probably use the simplest method (#1 or #2) to start with. In other words, get in, get the job done, and move on. Chances are very good in this case that you'll never have to go back and change it. If you never need to change it, any effort expended to generalize it would be wasted.

    If at some point in the future, you do need to change it, you'll be better positioned for deciding how to improve it. If it's five years later and you're making a very simple change that doesn't require you to handle multiple versions, you could just update the data structure and move on again. If there are more drastic changes, you might decide to switch to a more general solution (#3 .. #5).

    For stuff you know is going to change soon, all the options you listed are reasonable. If time is really tight right now, you might choose #1 with the intention of upgrading to a better method later. However, this would incur some "technical debt" for your project. If you're in a shop that never lets you pay down your technical debt, then do it the best way you can the first time to avoid that debt. Otherwise you'll dig yourself a hole you'll have trouble climbing out of.

    ...roboticus

    Just one robots opinion...
Re: Including periodically changing data with modules
by brian_d_foy (Abbot) on Jul 10, 2008 at 22:10 UTC

    For Business::ISBN, which needs a list of group and publisher codes which change every so often, I created Business::ISBN::Data. This way, users can update their Business::ISBN::Data without messing with cpan://Business::ISBN]. I can update the data without making people re-install cpan://Business::ISBN].

    --
    brian d foy <brian@stonehenge.com>
    Subscribe to The Perl Review
Re: Including periodically changing data with modules
by brian_d_foy (Abbot) on Jul 10, 2008 at 22:10 UTC