Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Esteemed Monks,

Some months ago, Dominus post here a Meditation about a freed material he wrote on "Lightweight Database Techniques" in Perl (i.e. using flat text files as simple databases).

As he claims, these are useful techniques "When you don't have enough data to be bothered using a high-performance database, or when your data is simple enough that you don't want to bother with a relational database, you stick it in a flat file and hack up some file code to read it"

One of these techniques involves the use of his Tie::File module (i.e. how to access and manipulate a text file as if it were a Perl array). One limitation of Tie::File is that each element of the resulting array corresponds to one line in the tied file . You can always use the parameter "recsep" to define what a record is, but this just change $/ locally.

While working with this module I thought that it would be nice to be able to define "records" in a more complex way than just doing the 1 line <=> 1 record assignment.

For example, consider the following simple piece of data:

Peter 3 Peter 15 Peter 5 John 1 John 7 Mike 4

If you are accessing a text file that contains this data with Tie::File you will find that the first record is "Peter 3", the second "Peter 15" and so on. Maybe that is what you want, but in many cases it would be more useful to get all "Peter" entries in the first record, all "John" entries in the second, etc...

With this in mind I wrote a small and simple module: Tie::File::AnyData, which adds this functionality to Tie::File. This module accepts the optional extra parameter "code" to its constructor. This must be a code reference (an anonymous subroutine) that must be able to read one record per call from the tied file.

The source code and the documentation of this module can be obtained from http://lotka.uv.es/scriptome/data/attic/wiki/Tie-File-AnyData-0.01.tar.gz:

One example of use could be:

use Tie::File::AnyData; my $coderef = sub { ## Code to retrieve one by one the records from a file (one rec +ord per call) }; tie my @data, 'Tie::File::AnyData', $file, code => $coderef; ## Use the tied array untie @data;

The module works by hacking (re-defining) the function "_read_record" in Tie::File (the function that reads the records from the file). The rest of the functionality of Tie::File remains intact. This means that if you don't provide the "code" parameter, you obtain the same results as with Tie::File

use Tie::File::AnyData; tie my @data, 'Tie::File::AnyData', $file; ## Use the tied array as with Tie::File

Because it may be hard and tedious to define a new anonymous subroutine that can parse the records of a file each time you use the module, you can subclass it with predefined formats. For example, Tie::File::AnyData::CSV (that can be obtained from http://lotka.uv.es/scriptome/data/attic/wiki/Tie-File-AnyData-CSV-0.01.tar.gz) can parse correctly the kind of data given in the above example:

Peter 3 Peter 15 Peter 5 John 1 John 7 Mike 4
use Tie::File::AnyData::CSV; tie my @arr, 'Tie::File::AnyData::CSV', $file or die; print "$arr[0]\n";

Prints:

Peter 3 Peter 15 Peter 5

Another example is given in Tie::File::AnyData::Bio::Fasta (that can be obtained from http://lotka.uv.es/scriptome/data/attic/wiki/Tie-File-AnyData-Bio-Fasta-0.01.tar.gz), this module subclass Tie::File::AnyData and is able to read a FASTA file as a Perl array where each element in the array corresponds to one fasta sequence. One example of use could be:

use Tie::File::AnyData::Bio::Fasta; tie my @fastaArray, 'Tie::File::AnyData::Bio::Fasta' or die $!; # Substitute the 10th sequence: $fastaArray[9] = $newsequence; # Get 10 random sequences: use List::Util qw/shuffle/; my @out = (shuffle @fastaArray)[0..9];

citromatik


In reply to RFC: Tie::File::AnyData Lightweight Databases in Perl by citromatik

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others rifling through the Monastery: (5)
    As of 2019-04-22 16:15 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      I am most likely to install a new module from CPAN if:
















      Results (112 votes). Check out past polls.

      Notices?