DBD::RAM

Item Description: A DBI driver for files and data structures

Review Synopsis:

Description

DBD::RAM is Jeff Zucker's driver for the DBI that allows you to import files in memory and treat them as relational tables, with SQL queries.
Several tables can be created this way and SQL joins can be simulated through loops.
Changes to the data can be reflected to the original file (if the table is created using the catalog function) or the table can be export-ed to one of the formats supported by DBD::RAM.

DBD::RAM can process the following formats:

CSV (Comma Separated Values),
Fixed-width records, using pack to define the format,
Perl structures (arrays of arrayrefs or hasrefs) so you can create the data in the Perl script and then use it as a relational table,
XML,
User-defined, which allows the user to define a subroutine that will parse the input and return an array with the various fields,
Other DBI data bases, so you can load a table in memory, close the connection and then process the data,
MP3 headers from a group of directories.

The data in all formats can be input either locally, from strings, files or pipes, or remotely through LWP

Why use DBD::RAM

you are dealing with existing data, which format you have no control over, but you still want to access it through the DBI interface, using SQL,
you want to use SQL without installing a relational DB,
you want to prototype an application without a DB but think that you might add one down the line,
you want to convert data from a DBD::RAM supported format to an other,
you want to use an XML file as a table (exporting it back might not work for you though)

Why NOT use DBD::RAM

You will not use DBD::RAM essentially if you need a real data base.

you process huge amounts of data,
the data is already in an existing DB.

Example

The obligatory XML example:

# connect to the DB
my $dbh= DBI->connect( "DBI:RAM:" , {RaiseError => 1} );

# create the table
$dbh->func(
        { table_name  => 'projects',
          data_type   => 'XML',
          record_tag  => 'projects project',
          col_names   => 'pid,name,description',
          file_source => 'project.xml',
        }, 'import');

# prepare the SQL statement
my $sth= $dbh->prepare( "SELECT * FROM projects");
# execute the statement
$sth->execute();

# output the result of the query
while( my $hashref= $sth->fetchrow_hashref())
  { foreach my $field ( keys %$hashref)
      { print "$field: $hashref->{$field}\t"; }
    print "\n";
  }

# export the table back as XML
$dbh->func(
        { data_type   => 'XML',
          data_target => "new_projects.xml",
          data_source => "select * from projects",
          record_tag  => 'projects project',
          col_names   => 'pid,name,description',
        }, 'export' );
[download]

Note on XML import: make sure you include the whole hierarchy of tags in the record_tag argument, from the document root to the record tag itself.

Personal Notes

I really like DBD::RAM. It allows treating lots of structured data as a relational table, including XML (mostly for extracting data from an XML file though). It also allows quick prototyping without having to go through the pain of yet-another-mysql install.

The good
DBD::RAM is really flexible. For example in the CSV format the field and record separators can actually be redefined so it's more like ASV (Anything Separated Values). For most formats field definitions can also be extracted from the first line of the file instead of being hard coded in the script.
The XML import allows you a good deal of customizing the data you want to extract from the XML file, including having records inherit attributes from their parents. Encoding conversions, and especially latin-1 output are also handled.
The documentation is pretty good: it is comprehensive and includes lots of examples that can be cut-n-paste'd.

The not-so-good
The initial debugging of an application can be quite a pain though, as error messages on import are no too helpful, they tell you that something is wrong but not quite where the exact error is.
Some XML data is difficult to extract (for example if several parent tags have attributes with the same name you can't fold them properly) but this can be fixed by a simple pre-processing of the data.
The syntax of the XML option is slightly confusing (space separated list of tags for the record_tag argument, but comma (no space) for the col_names argument.
This should improve with future versions of the module.

Related Modules

For XML processing you might want to have a look at XML::RAX or at DBIx::XML_RDB (to export XML from a RDB).

The future

DBD::RAM should be replaced by AnyData and DBD::AnyData any time now, see this note for the planned architecture.

Update: after the author reviewing the... review I have fixed a couple of (embarassing) mistakes: DBD::RAM does file locking (through flock), but it does not do join. I have fixed the review accordingly (plus a couple of other points Jeff mentioned.)

Update (2): AnyData and DBD::AnyData are out and officially replace DBD::RAM.

Comment on DBD::RAM Download Code


Don't ask to ask, just ask
	PerlMonks