http://www.perlmonks.org?node_id=440045

perrin has asked for the wisdom of the Perl Monks concerning the following question:

I'm benchmarking the latest SQLite for use in sharing data between mod_perl processes. The use case is a shared hash, which is modeled as a single table with columns "a_key" and "value". However, my code is running pretty slow. Simple MySQL code doing the same thing goes three times as fast, and BerkeleyDB goes about 17 times as fast. I'm wondering if SQLite is really this slow, or if I'm just not aware of some important tuning step with it. Removing the commit after storing data makes it three times as fast, but then it's no longer doing something comparable to the MySQL and BDB code since other processes would not be able to see that data.

The SQLite part of my test code is below. I just bang on this with a couple of hundred thousand hits to store() and fetch() and see how long it takes. I'm using the latest DBI and DBD::SQLite.

package IPC::SharedHash::DBDSQLite; use strict; use warnings; use DBI; use DBD::SQLite; use Storable qw(nfreeze thaw); my $DB_NAME = 'SQLite'; sub new { my $class = shift; my $self = {}; my %options = @_; $self->{'SCALARS_ONLY'} = 1; my $dbh = DBI->connect_cached( 'dbi:SQLite:dbname=' . $options{'DIRECTORY'} . '/' . $DB_NAME, '', '', { AutoCommit => 0 }) || die; $dbh->{'RaiseError'} = 1; $dbh->do('PRAGMA synchronous = OFF'); unless ( $dbh->tables( '', '', 'shared_hash', '' ) ) { $dbh->do( 'create table shared_hash ( a_key text primary key, value +text )'); } my $fetch_sth = $dbh->prepare('select value from shared_hash where a_key = ?'); my $store_sth = $dbh->prepare('replace into shared_hash (a_key, value) values (? +, ?)'); $self->{'dbh'} = $dbh; $self->{'fetch'} = $fetch_sth; $self->{'store'} = $store_sth; bless $self, $class; return $self; } sub fetch { my $self = shift; my ($key) = @_; $self->{'fetch'}->execute($key); my $value; $self->{'fetch'}->bind_columns( \$value ); $self->{'fetch'}->fetch(); $self->{'fetch'}->finish(); if ( $self->{'SCALARS_ONLY'} ) { return $value; } else { if ( defined $value ) { return thaw($value); } } } sub store { my $self = shift; my ( $key, $value ) = @_; if ( !$self->{'SCALARS_ONLY'} ) { warn $self->{'SCALARS_ONLY'}; $value = nfreeze($value); } $self->{'store'}->execute( $key, $value ); $self->{'dbh'}->commit(); } sub DESTROY { my $self = shift; $self->{'dbh'}->disconnect(); } 1;
... and here's a small sample of calling it:
#!/usr/bin/perl use strict; use warnings; use IPC::SharedHash::DBDSQLite; my $hash = IPC::SharedHash::DBDSQLite->new( DIRECTORY => '/tmp', SCALARS_ONLY => 1, ); for ( 0 .. 10000 ) { $hash->store( "key$_", $_ ); my $return = $hash->fetch("key$_"); die "bad return: $return" unless $return == $_; }

Replies are listed 'Best First'.
Re: DBD::SQLite tuning
by dragonchild (Archbishop) on Mar 16, 2005 at 18:04 UTC
    It appears, from cursory examination, that commits on SQLite are implied to be slow.

    From the DBD::SQLite documentation:

    SQLite is fast, very fast. I recently processed my 72MB log file with it, inserting the data (400,000+ rows) by using transactions and only committing every 1000 rows (otherwise the insertion is quite slow), and then performing queries on the data. (Emphasis added)

    From http://www.sqlite.org/cvstrac/wiki?p=PerformanceConsiderations, it says:

    When doing lots of updates/inserts on a table it is a good idea to contain them within a transaction, . . . This will make SQLite write all the data to the disk in one go, vastly increasing performance. (Emphasis added)

    Apparently, what's happening is there is a disk flush of the actual changes to the table upon commit. This apparently is to allow for preservation of committed data if the database crashes. Oracle and MySQL (using InnoDB) have REDO logs which capture the commands that made the change, then actually apply the changes when it's convenient or necessary. (MySQL, using MyISAM tables, doesn't have this guarantee. This is why you can have corruption of MyISAM tables, but not InnoDB tables.)

    This constant flushing of the tables (and the attendant indices) looks to be why you have the performance issues.

    Note: I don't have proof this is what's happening, but it seems like a reasonable guess.

    Being right, does not endow the right to be rude; politeness costs nothing.
    Being unknowing, is not the same as being stupid.
    Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
    Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

      As I said in the intro, leaving out all commits makes it go about three times as fast, but then it wouldn't work for sharing the data between multiple processes. At least I'm assuming that other processes can't see uncomitted data.
        I understand all of that. You were asking "Why is this slow?" (which is the necessary precursor to "How do I make this faster?"). I replied "Commits are implied to be slow in SQLite."

        I suspect that you are using SQLite correctly, but not for its intended purpose. It's like using Oracle for an online forum or BDB for a data warehouse. You could do that, but it wouldn't be recommended as best.

        The person who claimed Foo is Nx faster than Bar is correct, for his/her needs. And, for your needs, Bar is Nx faster than Foo. We've all seen situations where a semi-tuned MySQL 4.1 database on a single CPU blew away an tuned Oracle 9.2 database running on a quad-Xeon. Yet, we're not going to claim that MySQL 3.x is faster than Oracle in all situations.

        For your usage, BDB is clearly better. SQLite is probably better than BDB if the following is true:

        • Reads * LargeNumber >= Writes
        • Reads involve complex queries
        • Different reads may use different indices
        Otherwise, it's probably best to use BDB.

        Of course, I'm still wondering why SQLite is good when embedded MySQL exists. (Though the weird license is probably part of it ...)

        Being right, does not endow the right to be rude; politeness costs nothing.
        Being unknowing, is not the same as being stupid.
        Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence.
        Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.

Re: DBD::SQLite tuning
by jZed (Prior) on Mar 16, 2005 at 18:01 UTC
    I'd be curious to know how DBD::DBM on top of BerkeleyDb would compare. With help from Dan Wright and Dean Arnold I'm currently working on optimizing SQL::Statement so if DBD::DBM is comparable now, it will be even better soon. My guess is that even before the optimization, it will be as fast or faster than SQLite on primary key equality tests and inserts but lag behind on more complex queries and deletes. -- Jeff
      I appear to be abysmally wrong :-(. DBD::DBM is faster than DBD::SQLite at inserts by a factor of 10, but is slower on queries (even primary key equality) by a factor much higher than that. Oh well, we'll see what optimization brings.
Re: DBD::SQLite tuning
by jdv79 (Sexton) on Mar 16, 2005 at 19:30 UTC
    I'd like to know why you'd use SQLite for this application instead of BerkleyDB. If all you are storing is key value pairs then SQLite is overkill. Sounds like a perfect fit for BerkleyDB to me. Its not as cumbersome as MySQL and is just as easy to install and use as SQLite. Did you choose SQLite because of some other factor?
      I wouldn't. I'm benchamrking it because someone on the mod_perl list claimed that SQLite 3 was faster than BDB and I wanted to see if it was true.
Re: DBD::SQLite tuning
by Anonymous Monk on Mar 16, 2005 at 19:37 UTC
    in my experience, writing has been 3x the speed of mysql, but reading has been 1/10 or quicker
Re: DBD::SQLite tuning
by zakzebrowski (Curate) on Mar 17, 2005 at 02:43 UTC
    SQLite is a database, and you need to define indexes for columns that you will be selecting. (You may want to read a good database book for examples of when, and when NOT to use an index... for that matter, when you want to use a database and when to use a delimited text file...) Adding an index improves performace, at least at select time. Not tested for replace time, but I leave that as an exercise to the user...
    Example code:
    #!/usr/local/bin/perl use strict; use DBI; use DBD::SQLite; use Benchmark; my $dbh = DBI->connect("dbi:SQLite:dbname=testdb","",""); print "Connected to Database OK.\n"; print "Creating Tables && INSERTING\n"; my $sql = qq{CREATE TABLE A (A INT, B VAL)}; my $sth = $dbh->prepare($sql); $sth->execute(); my $i=0; while ($i<10000){ my $sql = qq{INSERT INTO A(A,B) VALUES (?,?)}; my $sth = $dbh->prepare($sql); $sth->execute($i,$i); $i++; } print "DONE\n"; sub doSelect{ # Yes, I could pre cache the query, but this won't affect the result +s my $sql = qq{SELECT A FROM A WHERE B = ?}; my $sth = $dbh->prepare($sql); my $i =int( rand 10000); $sth->execute($i); my @ary = $sth->fetchrow_array(); } timethese (10000,{'MethodOne'=>'&doSelect'}); my $sql2 = qq{CREATE INDEX FOO ON A(B)}; my $sth2 = $dbh->prepare($sql2); $sth2->execute(); timethese (10000,{'MethodOne'=>'&doSelect'});

    Results:
    Connected to Database OK. Creating Tables && INSERTING DONE Benchmark: timing 10000 iterations of MethodOne... MethodOne: 100 wallclock secs (71.48 usr + 18.18 sys = 89.66 CPU) @ 1 +11.53/s (n=10000) Benchmark: timing 10000 iterations of MethodOne... MethodOne: 5 wallclock secs ( 2.83 usr + 1.11 sys = 3.94 CPU) @ 25 +38.07/s (n=10000)


    ----
    Zak - the office
      My code has an index. Take a closer look at the create statement. Marking a column as the primary key causes SQLite to create a unique index on that column.

      Just for fun, I tested creating the index separately, and making it a unique index or not. There was no significant change.

        D'oh... I was running on a caffine low at that point in the afternoon yesterday... Where did you hear the 3x claim?


        ----
        Zak - the office