Beefy Boxes and Bandwidth Generously Provided by pair Networks RobOMonk
Do you know where your variables are?
 
PerlMonks  

BerkeleyDB blowing up on multiple, concurrent processes

by m0r0n (Initiate)
on Jun 14, 2006 at 19:11 UTC ( #555331=perlquestion: print w/ replies, xml ) Need Help??
m0r0n has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have a program that's meant to stress the BerkeleyDB::Hash module to see how well it can perform with multiple, concurrent readers/writers. I am new to Perl _and_ BerkeleyDB, so go easy on me ;-)

Running the program below mostly yields the following errors:

db_put failed: DB_RUNRECOVERY: Fatal error, run database recovery ( / +Successful return: 0) main::spawn_child('BerkeleyDB::Env=ARRAY(0x82e5248)','BerkeleyDB:: +Hash=ARRAY(0x80639b4)') called at douji.pl line 115 main::spawn_child('BerkeleyDB::Env=ARRAY(0x82e5248)','BerkeleyDB:: +Hash=ARRAY(0x80639b4)') called at douji.pl line 115

One instance of the above for each child process. Sometimes I get some errors like this, too:

db_put failed: Cannot allocate memory (Cannot allocate memory / reallo +c: Cannot allocate memory: 4294967190) main::spawn_child('BerkeleyDB::Env=ARRAY(0x82e5248)','Berkeley +DB::Hash=ARRAY(0x80639b4)') called at douji.pl line 115

Running db_recover does not allow it to then finish correctly the next time, however.

Does anyone have any ideas what I'm doing wrong? I was under the impression that using DB_INIT_CDB was enough to protect against concurrent writes, but do I need transactions instead? I tried using BerkeleyDB's transaction stuff earlier and that failed with the same error messages.

TIA :)

P.S. I am using BerkeleyDB CPAN module version 0.28 on RedHat 9 using the db4 4.0.14 packages.

#!/usr/bin/env perl # vim:set ts=4 sw=4 ai: use strict; use warnings; use Carp; use POSIX ":sys_wait_h"; use Data::Dumper; use lib '/junk/lib'; use BerkeleyDB; use BerkeleyDB::Hash; use Time::HiRes qw(gettimeofday tv_interval); use constant RUNS => 1000000; use constant DBPATH => '/junk/test.db'; use constant NUM_CHILDREN => 10; $|++; # print backtrace info sub bt { croak "DIED ($.)\n"; } # generate random integer sub rand_num { 4278190080 + int(rand(1048576)); } # read an entry from a BDB sub read_db { my ($db, $key) = @_; my $val; $db->db_get($key, $val); return (defined($val)) ? eval($val) : {}; } # increment an entry in a BDB, inserting if not there sub write_db { my ($db, $key, $stuff) = @_; my $val = read_db($db, $key); $val->{timestamp} = time; $val->{events}++; $val->{stuff}++ if ($stuff); my $str = Dumper(\$val) or croak "failed to serialize row data: $! +\n"; return $db->db_put($key, $str); } # fork off a worker process that will mutate the BDB independently of # the others sub spawn_child { my ($env, $db) = @_; my $pid = fork; croak "Could not fork: '$!'\n" unless (defined($pid)); return unless ($pid == 0); # child process starts here my ($start, $finish, $key, $ip, $rv, $val, $reads, $writes, $delet +es); print "Staring worker $$...\n"; # perform a number of operations using BDB transactions $start = [ gettimeofday ]; for (1..RUNS) { $key = int(rand(101)); $ip = rand_num(); if ($key <= 49) { # do random update/insert $rv = write_db($db, $ip, ($key % 2 == 0)); croak "db_put failed: $rv ($! / $BerkeleyDB::Error)\n" unless ($rv == 0); $reads++; $writes++; } elsif ($key <= 98) { # do random read read_db($db, $ip); $reads++; } else { # do random delete $db->db_del($key); $deletes++; } } $finish = [ gettimeofday ]; # print summary info for timing my $sum = ($reads || 0) + ($writes || 0) + ($deletes || 0); my $elapsed = tv_interval($start, $finish); printf "$$: %d reads, %d writes, %d deletes (%d total) in %0.2f se +conds: %0.2f ops/sec\n", ($reads || 0), ($writes || 0), ($deletes || 0), $sum, $elapsed, $sum / $elapsed; exit 0; } # install signal handlers $SIG{__WARN__} = \&bt; # create BDB environment so we can enable transactions my $env = new BerkeleyDB::Env( -Cachesize => 4 * 1024 * 1024, -Flags => (DB_CREATE() | DB_INIT_CDB() | DB_INIT_MPOOL()), -SetFlags => DB_CDB_ALLDB()) or croak "Failed to create DB env: '$!' ($BerkeleyDB::Error)\n"; # open database my $db = new BerkeleyDB::Hash( -Filename => DBPATH, -Env => $env, -Flags => DB_CREATE()) or croak "Failed to open DBPATH: '$!' ($BerkeleyDB::Error)\n"; # spawn concurrent DB mutators and just wait for them to finish for (1..NUM_CHILDREN) { spawn_child($env, $db); } for (1..NUM_CHILDREN) { my $pid = wait; }

Comment on BerkeleyDB blowing up on multiple, concurrent processes
Select or Download Code
Re: BerkeleyDB blowing up on multiple, concurrent processes
by ioannis (Curate) on Jun 14, 2006 at 19:31 UTC
    The first thing I would do is read the logs; enable logging with DB_INIT_LOG .
Re: BerkeleyDB blowing up on multiple, concurrent processes
by perrin (Chancellor) on Jun 14, 2006 at 19:32 UTC
Re: BerkeleyDB blowing up on multiple, concurrent processes
by m0r0n (Initiate) on Jun 14, 2006 at 19:58 UTC
    Figured out what was wrong. The $env and $db vars need to be child-process-local or the locking doesn't work right. Here's the updated, working script code:
    #!/usr/bin/env perl # vim:set ts=4 sw=4 ai: use strict; use warnings; use Carp; use POSIX ":sys_wait_h"; use Data::Dumper; use lib '/junk/lib'; use BerkeleyDB; use BerkeleyDB::Hash; use Time::HiRes qw(gettimeofday tv_interval); use constant RUNS => 1000000; use constant DBPATH => '/junk/test.db'; use constant NUM_CHILDREN => 10; $|++; # print backtrace info sub bt { croak "DIED ($.)\n"; } # generate random integer sub rand_num { 4278190080 + int(rand(1048576)); } # read an entry from a BDB sub read_db { my ($db, $key) = @_; my $val; $db->db_get($key, $val); return (defined($val)) ? eval($val) : {}; } # increment an entry in a BDB, inserting if not there sub write_db { my ($db, $key, $stuff) = @_; my $val = read_db($db, $key); $val->{timestamp} = time; $val->{events}++; $val->{stuff}++ if ($stuff); my $str = Dumper(\$val) or croak "failed to serialize row data: $! +\n"; return $db->db_put($key, $str); } # fork off a worker process that will mutate the BDB independently of # the others sub spawn_child { my $pid = fork; croak "Could not fork: '$!'\n" unless (defined($pid)); return unless ($pid == 0); my $env = new BerkeleyDB::Env( -Cachesize => 4 * 1024 * 1024, -Flags => (DB_CREATE() | DB_INIT_CDB() | DB_INIT_MPOOL())) or croak "Failed to create DB env: '$!' ($BerkeleyDB::Error)\n +"; my $db = new BerkeleyDB::Hash( -Filename => DBPATH, -Env => $env, -Flags => DB_CREATE()) or croak "Failed to open DBPATH: '$!' ($BerkeleyDB::Error)\n"; # child process starts here my ($start, $finish, $key, $ip, $rv, $val, $reads, $writes, $delet +es); # perform a number of operations using BDB transactions $start = [ gettimeofday ]; for (1..RUNS) { $key = int(rand(101)); $ip = rand_num(); if ($key <= 49) { # do random update/insert $rv = write_db($db, $ip, ($key % 2 == 0)); croak "db_put failed: $rv ($! / $BerkeleyDB::Error)\n" unless ($rv == 0); $reads++; $writes++; } elsif ($key <= 98) { # do random read read_db($db, $ip); $reads++; } else { # do random delete $db->db_del($key); $deletes++; } } $finish = [ gettimeofday ]; # print summary info for timing my $sum = ($reads || 0) + ($writes || 0) + ($deletes || 0); my $elapsed = tv_interval($start, $finish); printf "$$: %d reads, %d writes, %d deletes (%d total) in %0.2f se +conds: %0.2f ops/sec\n", ($reads || 0), ($writes || 0), ($deletes || 0), $sum, $elapsed, $sum / $elapsed; exit 0; } # install signal handlers $SIG{__WARN__} = \&bt; # spawn concurrent DB mutators and just wait for them to finish for (1..NUM_CHILDREN) { spawn_child(); } for (1..NUM_CHILDREN) { my $pid = wait; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://555331]
Approved by friedo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2014-04-20 19:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (487 votes), past polls