Hi.
Well as I stated earlier I had serious reservations about the validity of your benchmark. When I messaged you about it you replied "The point was to use all the modules' simplest interfaces". This is a specious argument. There is no difference between the interface for a DB_HASH or a DB_BTREE. The difference is underneath at the implementation level. And the difference is huge.
I reran your benchmark without CDB_File, it wont build cleanly on W32, and produced the following results which incidentally do not completely support my earlier coments ( the file size differences are not what I said they would be.) However they do show that your comparison of DB_HASH against DBD::SQLite is misleading (SQLite gets _creamed_ by a $DB_TREE), and they also show that if you are looking for high speed with a BerkeleyDB then you should definately go with a $DB_BTREE instead of a $DB_HASH.
My results were as follows:
Benchmark: timing 30000 iterations of
I B+Tree, I B+Tree (A), I Hash, I Hash (A), I SQLite, I SQLite 2...
I B+Tree: 1 wallclock secs ( 0.88 usr + 0.11 sys = 0.98 CPU) @ 30487.80/s (n=30000)
I B+Tree (A): 1 wallclock secs ( 0.95 usr + 0.02 sys = 0.97 CPU) @ 30959.75/s (n=30000)
I Hash: 8 wallclock secs ( 2.25 usr + 4.38 sys = 6.63 CPU) @ 4528.30/s (n=30000)
I Hash (A): 7 wallclock secs ( 2.31 usr + 4.58 sys = 6.89 CPU) @ 4353.50/s (n=30000)
I SQLite: 76 wallclock secs (14.84 usr + 49.66 sys = 64.50 CPU) @ 465.12/s (n=30000)
I SQLite 2: 4 wallclock secs ( 3.30 usr + 0.09 sys = 3.39 CPU) @ 8849.56/s (n=30000)
Rate I SQLite I Hash (A) I Hash I SQLite 2 I B+Tree I B+Tree (A)
I SQLite 465/s -- -89% -90% -95% -98% -98%
I Hash (A) 4354/s 836% -- -4% -51% -86% -86%
I Hash 4528/s 874% 4% -- -49% -85% -85%
I SQLite 2 8850/s 1803% 103% 95% -- -71% -71%
I B+Tree 30488/s 6455% 600% 573% 245% -- -2%
I B+Tree (A) 30960/s 6556% 611% 584% 250% 2% --
Benchmark: timing 30000 iterations of
S B+Tree, S B+Tree(A), S Hash, S Hash(A), S SQLite, S SQLite 2...
S B+Tree: 1 wallclock secs ( 0.77 usr + 0.09 sys = 0.86 CPU) @ 34965.03/s (n=30000)
S B+Tree(A): 1 wallclock secs ( 0.95 usr + 0.05 sys = 1.00 CPU) @ 30000.00/s (n=30000)
S Hash: 2 wallclock secs ( 1.03 usr + 1.63 sys = 2.66 CPU) @ 11295.18/s (n=30000)
S Hash(A): 3 wallclock secs ( 1.47 usr + 1.36 sys = 2.83 CPU) @ 10608.20/s (n=30000)
S SQLite: 11 wallclock secs ( 6.80 usr + 3.69 sys = 10.49 CPU) @ 2861.23/s (n=30000)
S SQLite 2: 4 wallclock secs ( 3.89 usr + 0.09 sys = 3.98 CPU) @ 7530.12/s (n=30000)
Rate S SQLite S SQLite 2 S Hash(A) S Hash S B+Tree(A) S B+Tree
S SQLite 2861/s -- -62% -73% -75% -90% -92%
S SQLite 2 7530/s 163% -- -29% -33% -75% -78%
S Hash(A) 10608/s 271% 41% -- -6% -65% -70%
S Hash 11295/s 295% 50% 6% -- -62% -68%
S B+Tree(A) 30000/s 948% 298% 183% 166% -- -14%
S B+Tree 34965/s 1122% 364% 230% 210% 17% --
Resulting File Sizes:
bdb.hash.test :5226496
bdb.btree.test :6701056
bdb.hash.a.test :5095424
bdb.btree.a.test :3579904
sqlite.test :3845120
sqlite2.test :2043904
And the code I used to produce this benchmark is
use DB_File;
use DBI;
use Benchmark qw(cmpthese);
use strict;
use warnings;
# do some setup
my $id_BDB_hash = 0;
my $id_BDB_hash_a = 0;
my $id_BDB_btree_a = 0;
my $id_BDB_btree = 0;
my $id_lit2 = 0;
my $id_lit = 0;
my $txt = 'a' x 100;
unlink(
"bdb.hash.test", "bdb.btree.test", "bdb.hash.a.test",
"bdb.btree.a.test", "sqlite.test", "sqlite2.test"
);
my $dbh =
DBI->connect( 'dbi:SQLite:dbname=sqlite.test', '', '', { AutoCommi
+t => 1, RaiseError => 1 } );
my $dbh2 =
DBI->connect( 'dbi:SQLite:dbname=sqlite2.test', '', '', { AutoComm
+it => 0, RaiseError => 1 } );
$dbh->do('CREATE TABLE test (id integer PRIMARY KEY, foo varchar(100))
+');
$dbh2->do('CREATE TABLE test (id integer PRIMARY KEY, foo varchar(100)
+)');
$dbh->do('PRAGMA default_synchronous = off');
my $sth_ins = $dbh->prepare('INSERT INTO test (id, foo) values (?, ?)
+');
my $sth_sel = $dbh->prepare('SELECT foo FROM test where id=?');
my $sth2_ins = $dbh2->prepare('INSERT INTO test (id, foo) values (?, ?
+)');
my $sth2_sel = $dbh2->prepare('SELECT foo FROM test where id=?');
tie my %bdb_btree_a, "DB_File", "bdb.btree.a.test", O_RDWR | O_CREAT,
+0640, $DB_BTREE
or die "Cannot open file 'bdb.btree.a.test': $!\n";
tied(%bdb_btree_a)->filter_fetch_key( sub { $_ = unpack( "N", $_ || 0
+) } );
tied(%bdb_btree_a)->filter_store_key( sub { $_ = pack( "N", $_ || 0 )
+} );
tie my %bdb_hash_a, "DB_File", "bdb.hash.a.test", O_RDWR | O_CREAT, 06
+40, $DB_HASH
or die "Cannot open file 'bdb.hash.a.test': $!\n";
tied(%bdb_hash_a)->filter_fetch_key( sub { $_ = unpack( "N", $_ || 0 )
+ } );
tied(%bdb_hash_a)->filter_store_key( sub { $_ = pack( "N", $_ || 0 ) }
+ );
tie my %bdb_hash, "DB_File", "bdb.hash.test", O_RDWR | O_CREAT, 0640,
+$DB_HASH
or die "Cannot open file 'bdb.hash.test': $!\n";
tie my %bdb_btree, "DB_File", "bdb.btree.test", O_RDWR | O_CREAT, 0640
+, $DB_BTREE
or die "Cannot open file 'bdb.btree.test': $!\n";
sub insertLite {
$sth_ins->execute( $id_lit++, $txt );
}
sub insertLite2 {
$sth2_ins->execute( $id_lit2++, $txt );
}
sub insertBDB_btree_a {
$bdb_btree_a{ $id_BDB_btree_a++ } = $txt;
}
sub insertBDB_hash_a {
$bdb_hash_a{ $id_BDB_hash_a++ } = $txt;
}
sub insertBDB_btree {
$bdb_btree{ $id_BDB_btree++ } = $txt;
}
sub insertBDB_hash {
$bdb_hash{ $id_BDB_hash++ } = $txt;
}
sub selectLite {
$sth_sel->execute( $id_lit++ );
1 while $sth_sel->fetch;
}
sub selectLite2 {
$sth2_sel->execute( $id_lit2++ );
1 while $sth2_sel->fetch;
}
sub selectBDB_btree_a {
$bdb_btree_a{ $id_BDB_btree_a++ };
}
sub selectBDB_hash_a {
$bdb_hash_a{ $id_BDB_hash_a++ };
}
sub selectBDB_btree {
$bdb_btree{ $id_BDB_btree++ };
}
sub selectBDB_hash {
$bdb_hash{ $id_BDB_hash++ };
}
cmpthese(
30_000,
{
'I SQLite' => \&insertLite,
'I SQLite 2' => \&insertLite2,
'I Hash' => \&insertBDB_hash,
'I B+Tree' => \&insertBDB_btree,
'I Hash (A)' => \&insertBDB_hash_a,
'I B+Tree (A)' => \&insertBDB_btree_a,
},
);
$id_BDB_hash = 0;
$id_BDB_hash_a = 0;
$id_BDB_btree_a = 0;
$id_BDB_btree = 0;
$id_lit2 = 0;
$id_lit = 0;
cmpthese(
30_000,
{
'S SQLite' => \&selectLite,
'S SQLite 2' => \&selectLite2,
'S Hash' => \&selectBDB_hash,
'S B+Tree' => \&selectBDB_btree,
'S Hash(A)' => \&selectBDB_hash_a,
'S B+Tree(A)' => \&selectBDB_btree_a,
},
);
foreach my $file (
"bdb.hash.test", "bdb.btree.test", "bdb.hash.a.test", "bdb.btree.a
+.test",
"sqlite.test", "sqlite2.test"
)
{
print "$file :" . ( -s $file ) . "\n";
}
As Matts said, please no comments about the benchmarking code, I know its could be a lot nicer. (Its only a benchmark... :-)
Yves / DeMerphq
--
Benchmarking things isn't as easy or as straightforward as many people seem to think.... |