|Problems? Is your data what you think it is?|
Modules dealing with data filesby grinder (Bishop)
|on Nov 10, 2006 at 18:30 UTC||Need Help??|
grinder has asked for the
wisdom of the Perl Monks concerning the following question:
(Cross-posted from module-authors, because perl.org seems to be having difficulties today).
I have written a module that deals with France's INSEE codes, which allows one to look up postcodes and stuff like that. I've been toying with Geography::FR::Postcode as a name. (any other ideas?)
The thing is, it relies on a text file that is 750KiB zipped, updated periodically. So I'm looking at a reader package that knows how to pick apart a certain format (or formats) of the data file and answer questions (for instance, what towns have the postcode 66100). Reading the unzipped file on each run and producing hashes takes about a second, which is good enough for a first version.
One problem is that the INSEE web site doesn't make it easy to predict what the new filename will be, so I can't fetch the data from INSEE during the installation process. And I would like to avoid wrapping it up as a CPAN module. So I create another package, that contains a solitary package variable that contains the URI that points to the data file on INSEE's web site, and I just update that when new versions come out.
Something like this:
Installing Geography::FR::Postcode forces the dependency on GGeography::FR::Postcode::Data to be resolved first. So Data is downloaded and as part of its installation process, the file is downloaded and installed somewhere on the local system.
I suppose it will default to the site_perl directory if run in batch mode, but interactive installations allow the directory to be specified. OS distribution maintainers may wish to override the default (how? an environment variable à la PERL_G_F_P_PATH=/usr/local/share/doc/insee?)
After Geography::FR::Postcode::Data is installed, the installation of Geography::FR::Postcode goes forward (waving hands: knowing where Data put the damned file).
Next year, a new version of the INSEE file comes out. I test, and see that the current reader code can deal with it. I release a new version of Geography::FR::Postcode::Data. The client sees that there is an update for this, and installs it. New data file, everyone happy. (Assuming the installation causes the new file to overwrite the old one, otherwise Postcode will continue to run with the old file).
The following year, a new version comes out, and surprise! they've added a new column in the file. So I release a new version of Geography::FR::Postcode as well, that knows how to read both formats, and a new version of Geography::FR::Postcode::Data.
Does that sound sane? Does anyone have some pointers on how to deal with the placement of datafiles on the local system with one module, and having the other module know where to find them?
Or am I making this unnecessarily complicated? (I could just bundle the data file with the distribution, but the size of the data file, and the probability that the format is unlikely to change invites the above approach).
• another intruder with the mooring in the heart of the Perl