Item Description: A DBI driver for files and data structures
Review Synopsis:
Description
DBD::RAM
is Jeff Zucker's driver for the DBI
that allows you to import files in memory and treat them as relational tables,
with SQL queries.
Several tables can be created this way and SQL joins can be simulated
through loops.
Changes to the data can be reflected to the original file (if the table is
created using the catalog function) or the table can
be export-ed to one of the formats supported by DBD::RAM.
DBD::RAM can process the following formats:
- CSV (Comma Separated Values),
- Fixed-width records, using pack to define the format,
- Perl structures (arrays of arrayrefs or hasrefs) so you can create
the data in the Perl script and then use it as a relational table,
- XML,
- User-defined, which allows the user to define a subroutine that will
parse the input and return an array with the various fields,
- Other DBI data bases, so you can load a table in memory, close the
connection and then process the data,
- MP3 headers from a group of directories.
The data in all formats can be input either locally, from strings,
files or pipes, or remotely through
LWP
Why use DBD::RAM
- you are dealing with existing data, which format you have no control over, but
you still want to access it through the DBI interface, using SQL,
- you want to use SQL without installing a relational DB,
- you want to prototype an application without a DB but think
that you might add one down the line,
- you want to convert data from a DBD::RAM supported format to an
other,
- you want to use an XML file as a table (exporting it back might
not work for you though)
Why NOT use DBD::RAM
You will not use DBD::RAM essentially if you need a real data base.
- you process huge amounts of data,
- the data is already in an existing DB.
Example
The obligatory XML example:
# connect to the DB
my $dbh= DBI->connect( "DBI:RAM:" , {RaiseError => 1} );
# create the table
$dbh->func(
{ table_name => 'projects',
data_type => 'XML',
record_tag => 'projects project',
col_names => 'pid,name,description',
file_source => 'project.xml',
}, 'import');
# prepare the SQL statement
my $sth= $dbh->prepare( "SELECT * FROM projects");
# execute the statement
$sth->execute();
# output the result of the query
while( my $hashref= $sth->fetchrow_hashref())
{ foreach my $field ( keys %$hashref)
{ print "$field: $hashref->{$field}\t"; }
print "\n";
}
# export the table back as XML
$dbh->func(
{ data_type => 'XML',
data_target => "new_projects.xml",
data_source => "select * from projects",
record_tag => 'projects project',
col_names => 'pid,name,description',
}, 'export' );
Note on XML import: make sure you include the whole hierarchy
of tags in the record_tag argument, from the document
root to the record tag itself.
Personal Notes
I really like DBD::RAM. It allows treating lots of structured data as
a relational table, including XML (mostly for extracting data from an
XML file though). It also allows quick prototyping without having to
go through the pain of yet-another-mysql install.
The good
DBD::RAM is really flexible. For example in the CSV format the field and record separators
can actually be redefined so it's more like ASV (Anything Separated Values).
For most formats field definitions can also be extracted from the first line of the file
instead of being hard coded in the script.
The XML import allows you a good deal of customizing the data you want to
extract from the XML file, including having records inherit attributes from
their parents. Encoding conversions, and especially latin-1 output are
also handled.
The documentation is pretty good: it is comprehensive and includes lots of
examples that can be cut-n-paste'd.
The not-so-good
The initial debugging of an application can be quite a pain though, as error
messages on import are no too helpful, they tell you that something
is wrong but not quite where the exact error is.
Some XML data is difficult to extract (for example if several parent tags
have attributes with the same name you can't fold them properly) but this
can be fixed by a simple pre-processing of the data.
The syntax of the XML option is slightly confusing (space separated list
of tags for the record_tag argument, but comma (no space) for
the col_names argument.
This should improve with future versions of the module.
Related Modules
For XML processing you might want to have a look at
XML::RAX
or at DBIx::XML_RDB (to export XML from a RDB).
DBD::RAM should be replaced by AnyData and DBD::AnyData any time now, see
this
note for the planned architecture.
Update: after the author reviewing the... review I
have fixed a couple of (embarassing) mistakes: DBD::RAM does
file locking (through flock), but it does not do
join. I have fixed the review accordingly (plus
a couple of other points Jeff mentioned.)
Update (2): AnyData and DBD::AnyData are out and officially replace DBD::RAM.