Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Dealing with corrupt db_file files

by gossamer (Sexton)
on Jan 16, 2013 at 03:03 UTC ( [id://1013484]=perlquestion: print w/replies, xml ) Need Help??

gossamer has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm a novice perl programmer and have written a set of functions that use DB_File write to a hash that consists of a filename and some of its contents. It's actually quarantine files from amavisd-new and some info about the files, such as the subject, spam score, etc.

I've been using these routines for quite some time, but every once in a while, searching through one of the db files causes my scripts to just hang.

Is there a known problem with corruption, perhaps caused by locking, with db_file?

For example, when I use the typical routines to scan through the hash:

#!/usr/bin/perl -w # use perl; use DB_File; use DBI; use File::Basename qw(basename); use strict; my @nochosts = qw(); my $qdir = '/var/www/noc.mydomain.com-80/'; push @nochosts, 'nocmail01'; foreach my $noc (@nochosts) { my $file = sprintf('%s/%s/%02x.db', $qdir, $noc, 170); print "file: $file\n"; tie (my %hash, 'DB_File', $file, O_RDWR, 0600, $DB_HASH) || die " +Cannot tie $file\n"; print "tie finished\n"; foreach my $key ( keys %hash ) { print "processing key: $key\n"; } untie %hash; }

The 170 represents 'aa', so in this case I'm trying to read aa.db. I've hard-coded it here, where I normally have a loop that iterates through 256 db files. I've just done it here for brevity and the aa.db file is the one with the problem.

On occasion, the script will just hang after tieing to the db on the foreach line. I don't know how the file gets corrupt, but recreating it from the source of all the amavisd-new quarantine entries for that bucket fixes the problem.

So, what would cause the script to hang when trying to process the foreach line?

I've tried several other ways, including while loops, to process the hash, and they also lock up at that point.

The script that creates the hash is much more involved, so I've not posted that here for now.

Any ideas greatly appreciated.
Thanks,
Dave

Replies are listed 'Best First'.
Re: Dealing with corrupt db_file files
by grondilu (Friar) on Jan 16, 2013 at 06:52 UTC

    When calling keys %hash, you load all keys in memory at once. This can take quite some time if your database is not small. Are you sure your programm hangs dead? How long have you been waiting? Maybe a look at a process watcher such as top could help.

    Normally to loop through the keys of a tied database you should use the tied object (not the tied hash, see below) and a cursor using the seq method:

    my $X = tie my %hash, 'DB_File', $filename; my ($key, $value); for ( my $status = $X->seq($key, $value, R_FIRST); $status == 0; $status = $X->seq($key, $value, R_NEXT) ) { ... }

    I advise you to do something like that if you really need sequential access.

      > When calling keys %hash, you load all keys in memory at once. This can take quite some time if your database is not small. Are you sure your programm hangs dead? How long have you been waiting? Maybe a look at a process watcher such as top could help.

      No, I'm sure the process is hung. I've let it sit overnight.

      It only happens once every few weeks. Rebuilding the corrupt db on the originating server fixes the problem, but the only way to find out I need to do this is after learning my script has been running all night. I then need to isolate which db file it is that's causing the issue, and rebuild that one file.

      I will work to try and implement your changes. Do you think that will be better at dealing with corrupt entries in the db file?

      Thanks,
      Dave

        I don't know if using seq will solve the issue then. It's definitely the correct way to access your database sequentially, though.

        I'd also suggest you add some lines to log the execution of your script, so that you can know at which point exactly it hangs.

        Other than that, I don't see much else to do, especially considering it is difficult to reproduce.

Re: Dealing with corrupt db_file files
by Anonymous Monk on Jan 16, 2013 at 09:02 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1013484]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-04-19 23:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found