Ah, thank you! The exact string "cache=shared" gave some Google hits. This was fixed for DBD::SQLite this summer, it seems.
For a start, retrieving a given number of records from a single thread&connection is nearly 2 orders of magnitude quicker than retrieving those same records via a connection made via a single connection in a separate thread.
Two orders of magnitude slower using threads?! Sigh. Does it help to set "read uncommitted" on? (You do that in the connect string too?)
A few questions: What OS are you targeting? What does your data (schema) look like? What do your queries look like?
Honking big, multiprocessor Linux servers with Oracle and more memory than you can shake a stick at. :-) I'd like to replace an old C hack using a few GBytes of RAM for querying some subsets of a few tables really quick (quite a lot of work is required for some queries). The queries are data structures easily converted to SQL.
| [reply] |
Does it help to set "read uncommitted" on? (You do that in the connect string too?)
Not in my experiments on windows. (No, it is a pragma. See below.)
I do not have any great faith that my tests on windows will be reflected on *nix; so here is my test harness:
#! perl -slw
use strict;
use threads ( stack_size => 4096 );
use threads::shared;
use threads::Q;
use List::Util qw[ shuffle min ];
use Time::HiRes qw[ time ];
use DBI;
use constant {
CONNECT=> 'dbi:SQLite:dbname=file:memdb2?mode=memory&cache=shared'
+,
# CONNECT=> 'dbi:SQLite:dbname=mydb',
CREATE => 'create table if not exists DB ( ID integer(8),'
. join(',', map "F$_ text(15)", 1..9) . ')',
INSERT => 'insert into DB ( ID , '
. join( ',', map "F$_", 1..9 ) . ') values (' . '?,' x 9 . '?)
+',
INDEX => 'create index if not exists I1 on DB ( ID )',
QUERY => 'select * from DB where ID = ?',
};
sub timeit (&@) {
my $code = shift;
my $start = time;
$code->();
sprintf "Took %f seconds %s", time()-$start, @_ ? "(for @_)" : '';
}
sub thread {
my $tid = threads->tid;
my( $Q ) = @_;
my $dbh = DBI->connect( CONNECT, '', '' ) or die DBI::errstr;
my $sth = $dbh->prepare( QUERY ) or die DBI->errstr;
while( my $id = $Q->dq ) {
$sth->execute( $id ) or die DBI::errstr;
my $r = $sth->fetch or warn( "No data for $id" ) and next;
## do something with record.
#printf "[$tid] %5u %s %s %s %s %s %s %s %s %s\n", @{ $r };
}
$sth->finish;
$dbh->disconnect;
}
my @chars = ( 'a'..'z' );
sub dummy {
my $n = shift;
join '', @chars[ map int( rand @chars ), 1 .. $n ];
}
our $T //= 4;
our $N //= 100;
our $R //= min( 1000, $N );
our $Qsize //= 10;
my $dbh = DBI->connect( CONNECT, '', '', { AutoCommit =>0 } ) or die D
+BI::errstr;
$dbh->do( 'PRAGMA synchronous = off' );
$dbh->do( 'PRAGMA cache_size = 1073741824' );
$dbh->do( 'PRAGMA read_uncommitted = on' );
$dbh->do( CREATE ) or die DBI::errstr;
print timeit {
my $ins = $dbh->prepare( INSERT ) or die DBI->errstr;
for my $n ( 1 .. $N ) {
my @fields = ( $n, map dummy( 15 ), 1 .. 9 );
$ins->execute( @fields )or die $ins->errstr;
# $n % 10 or $dbh->commit
}
$ins->finish;
$dbh->commit;
} "Populate DB with $N records";
print timeit {
$dbh->do( INDEX ) or die DBI::errstr;
} "Create primary index";
print timeit {
my $sth = $dbh->prepare( QUERY ) or die DBI->errstr;
for my $id ( 1 .. $N ) {
$sth->execute( $id ) or die DBI::errstr;
my $r = $sth->fetch() or warn( "No data for $id" ) and next;
## do something with record.
}
$sth->finish;
} "Retrieve the whole lot";
print $dbh->selectrow_array( 'SELECT count(*) from DB' );
$dbh->disconnect;
print timeit {
my $Q = threads::Q->new( $Qsize );
my @threads = map threads->create( \&thread, $Q ), 1 .. $T;
$Q->nq( $_ ) for (shuffle 1 .. $N)[ 0 .. $R ];
$Q->nq( (undef) x $T );
$_->join for @threads;
$dbh->disconnect;
} "Read all $N records using $T threads";
unlink 'file'; ## Only needed on windows?
I'd love to see the results from a few runs done' on other OSs.
You can get threads::Q (my own, efficient, self-limiting queue implementation) from here.
With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
UPDATE: numbers are partly wrong; see further discussion lower down in this thread
I experimented a bit with this, on Centos 6.5. I had only a threaded perl 5.19.7 handy.
Initially I got similar (disappointing) results, but then I found two problems with the test program:
1. It turns out that after the "Create primary index"; block, a $dbh->commit is needed to get the index to be used/committed.
2. A (much) higher rowcount $N gives more interesting results. From around N$ = 10_000 and upwards, the threaded version *is* (much) faster.
For instance, for rowcount $N = 100_000:
$ perl 1067204.pl
Using minimum thread stack size of 16384 at /home/aardvark/perl-5.19/l
+ib/5.19.7/x86_64-linux-thread-multi/threads.pm line 49.
Took 6.262891 seconds (for Populate db with 100000 records)
Took 0.121979 seconds (for Create primary index)
Took 0.905479 seconds (for Retrieve the whole lot)
100000
Took 0.099463 seconds (for Read all 100000 records using 4 threads)
So, it was a good approach after all.
| [reply] [d/l] [select] |