Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Tie data hash to DBM file leaving massive cruft?

by punch_card_don (Curate)
on Mar 05, 2013 at 20:26 UTC ( #1021900=perlquestion: print w/ replies, xml ) Need Help??
punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:

Motional Monks,

For depressing-to-explain reasons, we're using an old DBM file (data_dbm_file) as a backup to a small sql database (data_db) of name/value pairs. Each time data_db is updated, the following sub routine is run:

tie %data_hash, "SDBM_File", "data_dbm_file", O_RDWR|O_CREAT, 0666 + || die "Cannot open file 'data_dbm_file': $!\n"; %data_hash = (); $sqlSelect = "SELECT field1, field2 FROM $data_db"; $sth = $dbh->prepare($sqlSelect) || die "Cannot prepare: " . $dbh- +>errstr(); $sth->execute() or die "Cannot execute: " . $sth->errstr(); while (@data = $sth->fetchrow_array()) { my $data_to_store = crypt($data[1], $private_key); $data_hash{$data[0]} = $data_to_store; } untie %data_hash; $sth->finish();
And this is re-run every time there's a change to the data_db database. So, essentially, the DBM file gets completely re-written every time there's any change to the data_db database.

It seems to work just fine - the DBM file is checked on occasion and always mirrors the sql database.

BUT - there's always a but - if I ftp a copy of the data_dbm_file from the server onto my (Windows) PC, and open it in Notepad, among the long list of data I find this sort of thing:

bogA4dxaWfeMo:Namen_ bogA4dxaWfeMo:Namen_ bogA4dxaWfeMo:Namen bogA4dxaWfeMo:Name bogA4dxaWfeMo:Namen_0 bogA4dxaWfeMo:Namen_0 bogA4dxaWfeMo:Namen bogA4dxaYbogA4dxaWfeMo:Name bogA4dxaWfeMo:Name bogA4dxaWfeMo:Namen acbogA4dxaWfeMo:Namen abogA4dxaWfeMo:Namen
Multiple entires with slight variations of old name/value pairs that were thought to be deleted long ago. There can be dozens of them for a single name. But they don't appear in a listing of the contents of the DBM file if we do:
tie %h, "DB_File", "data", O_RDWR|O_CREAT, 0666, $DB_HASH or die "Cann +ot open file 'data': $!\n"; print "Content-type:text/html\n\n"; foreach my $key (sort keys %h) { print "<br>$key -> $h{$key}\n"; }
This outputs just the expected name/value pairs.

What is all this extra cruft? Is this due to something about our re-writing sub-routine? Or a typical result in a DBM file? Or something else?


Time flies like an arrow. Fruit flies like a banana.

Comment on Tie data hash to DBM file leaving massive cruft?
Select or Download Code
Re: Tie data hash to DBM file leaving massive cruft?
by moritz (Cardinal) on Mar 05, 2013 at 21:00 UTC

    I haven't been able to find a specification of the dbm file format, so I can only guess. It could be either an artifact of storage format (some sort of tree), or it could be that instead of deleting entries, they are simply marked as deleted.

    One way to find out is compare a copy of the file you produce now with the another you get from inserting all the key/value pairs into an empty file.

    In fact I guess it's more efficient anyway to simply unlink the file instead of opening it, emptying it and then refilling it.

Re: Tie data hash to DBM file leaving massive cruft?
by Anonymous Monk on Mar 05, 2013 at 21:30 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1021900]
Approved by moritz
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (16)
As of 2014-07-31 14:01 GMT
Find Nodes?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:

    Results (249 votes), past polls