Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Best Practice for Lots of Static Data

by SirBones (Friar)
on Dec 01, 2006 at 02:45 UTC ( [id://587103]=perlquestion: print w/replies, xml ) Need Help??

SirBones has asked for the wisdom of the Perl Monks concerning the following question:

Greetings all. I have a "best practices" question. What's the best way to deal with an annoying amount (from a source file maintenance point-of-view) of static text data?

I have a utility that does some data analysis on returned bit strings coming from various devices under test; one of the things it does is translate the bits that are "on" to their text meaning. For example, I might get a bit string from "device1" which looks like:

100110000000....

And so on, sometimes for several hundreds of bits. I've been keeping the meaning of the bits in various arrays:

@device1_status = qw ( Enabled Critical_fault Warning_fault Voltage_level_1_on Voltage_level_2_on . . . );

And when I walk through the bit string, it's simple enough to display the state of things:

foreach my $index (0 .. $#bits) { print "$device1_status[$index]\n" if ( $bits[$index] ); }

There are a number of possible devices, and each device has a different set of information specific to it. So sometimes I've been combining the labels into a hash of arrays:

%status = { 'dev1' => [ qw( enabled crit_fault warning_fault . . ) ], 'dev2' => [ qw( power_on OV_condition UV_condition phase_fault . . ) ], };

In any case, things have become very unwieldy in my source file. I have hundreds of lines taken up with these lists, and it will eventually number thousands. I know there must be a better way to organize this. I can think of some possibilities to separate the data and the code; I'm sure there are others:

  • Stick all the big data structures in a separate Perl source file and do a require 'filename' to suck it in. Something like header file inclusion in C. Coming from a C background this is my first inclination.
  • Put the labels in a flat text file and read them into an array when I need to do the translation. This will mean I will need separate text files for each device type (15 to 20). But it's simple and allows me to deal with the labels as a completely separate entity.
  • Stick these structures into a separate module and use an access method to pull the info out as needed. My first feeling is that this is overkill for what I need, but it may be the most flexible approach; for example, it will allow other developers to use these mappings without reinventing the process. I don't really see that as a likely possibility now, but one never knows. It's hard to know if the extra effort in formulating the separate module(s) will be worth it.
  • Throw the data into a database. Not practical due to portability and installation issues.

I'm curious as to how those folks with more experience in production level Perl apps would handle this.

Thanks,
Ken

"This bounty hunter is my kind of scum: Fearless and inventive." --J.T. Hutt

Replies are listed 'Best First'.
Re: Best Practice for Lots of Static Data
by Firefly258 (Beadle) on Dec 01, 2006 at 02:55 UTC
    I prefer to use storable and place big data structures into a storable file. It's a bit like do/require FILE but easier and faster but i suppose the main advantage is it's ability to compact data by freezing it for later use or shared data in inter-process communication.

    There's also the Perl Data Language which allows one to efficiently store data and in the process make these operations faster. It is not readily intuitive and has a high learning curve but definitely worth considering.

    update: removed question about pickle files.


    perl -e '$,=$",$_=(split/\W/,$^X)[y[eval]]]+--$_],print+just,another,split,hack'er
Re: Best Practice for Lots of Static Data
by Util (Priest) on Dec 01, 2006 at 06:43 UTC

    Giving that you are trying to improve source-code management, as opposed to speed or configurability, I would move the static code into a proper module. You say that it is unlikely that other developers will need your code, but I would be surprised if such a large amount of code went into only one of *your* programs. Surprise aside, your code will be more testable with a 'use'able division for your tests to hook.

    And yes, I said 'code', not 'data'. Your technique of assigning ordered fieldnames to positional data is excellent, but since those fieldnames have no independent use outside your code, I would treat them as code (to be stored in modules) instead of data (to be stored in a database or flat files).

    Your first option, 'do/require', is not very different from a module with a bad interface, so I would skip the 'do' stage and go straight to a proper module. As long as you remain in single program/programmer mode, you can freely evolve the interface, piecemeal, from bad to great. See Perl Best Practices for great API advice.

    As a proof of concept, I built a simple module implementing a basic (and bad) interface. I used h2xs because it is core Perl, but would use ModuleMaker or Module::Starter in real life.

    $ h2xs -A -B -C -X -n SirBones::PerlMonks::Static_Data $ cd SirBones-PerlMonks-Static_Data $ edit lib/SirBones/PerlMonks/Static_Data.pm $ edit t/SirBones-PerlMonks-Static_Data.t $ prove -Ilib t/SirBones-PerlMonks-Static_Data.t

    In the .t file, I changed the tests from '=> 1' to '=> 2', and added:

    use SirBones::PerlMonks::Static_Data 'status_data'; is( status_data('dev2')->[2], 'UV_condition', '%status dev2' );

    In the .pm file, I change the EXPORT_OK line to:

    our @EXPORT_OK = ( 'status_data' );
    and added this just before the '1;' :
    our %status = ( 'dev1' => [ qw( enabled crit_fault warning_fault ) ], 'dev2' => [ qw( power_on OV_condition UV_condition phase_fault ) ], ); sub status_data { my ( $device ) = @_; return $status{$device}; }

    All (OK, both) tests passed.

Re: Best Practice for Lots of Static Data
by bobf (Monsignor) on Dec 01, 2006 at 05:49 UTC

    I'd suggest going with some combination of the 2nd and 3rd ideas:

    • Put the labels in a flat text file and read them into an array when I need to do the translation. This will mean I will need separate text files for each device type (15 to 20).
    • Stick these structures into a separate module and use an access method to pull the info out as needed.

    I would probably keep the data in separate files so it was easier to maintain. 15-20 files isn't that bad. I'd also favor a flat text format rather than a database or persisted data structure format (like Storable), since they would be human-readable and could be easily read (with minimal dependencies) by any other applications that needed the data.

    Writing accessors for each file/device could certainly streamline the interface. You could even create a (single) class that loaded the correct data as objects are instantiated (like a simplified factory design, if my OO terminology is correct). If the data really is as simple as you describe, though, objects may be overkill. Simply returning an arrayref may be just as easy.

    More experienced monks will likely suggest better ways to do this. :-)

Re: Best Practice for Lots of Static Data
by wjw (Priest) on Dec 01, 2006 at 04:23 UTC
    I am wondering if perhaps storing your status descriptions in an XML file and the using XML::XPath might work for you? Have found this to be a fairly handy way to store and retrieve properties which refer to 'devices' in a certain state. The XML is fairly easy to update and XPath makes it easy to get what you want out of your XML.

    Just a thought.. :-)

    ...the majority is always wrong, and always the last to know about it...

      No, you got that wrong. He needs to leverage the power of XML. If you talk about XML and leave out the word leverage, how on earth can I win Perlmonks bingo?

      I'm just bitter because I thought I was leveraging the power of XML today, but it turned out I was using XML::Simple instead :)

Re: Best Practice for Lots of Static Data
by ides (Deacon) on Dec 01, 2006 at 15:56 UTC

    While I can see that these lists won't change often, I see this more as a configuration problem than a data storage problem. So I would use Config::General to store and layout this data.

    It has the nice added benefit of separating out your "logic" and "data". So in the future, if you find yourself with a similar problem, but with different "static" data you don't have to make any code changes.

    Frank Wiles <frank@revsys.com>
    www.revsys.com

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://587103]
Approved by davidj
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2025-01-22 11:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which URL do you most often use to access this site?












    Results (63 votes). Check out past polls.