Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

DBD::Sqlite queries slow - and gives wrong results

by astroboy (Chaplain)
on Dec 11, 2017 at 19:42 UTC ( #1205302=perlquestion: print w/replies, xml ) Need Help??

astroboy has asked for the wisdom of the Perl Monks concerning the following question:

Ok, this is pretty weird IMHO. I have some Perl code written to create and populate a SQLite database. It's very simple - a list of users and their AD groups in two different tables. It's been running ok for at least 18 months. I have some other Perl code that will query the database. This querying code has started running slowly, and in some cases returns no results where I know there are records - I can test it by pasting the SQL in to the SQLiteStudio v2.1.5 editor. It returns the results instantly.

Here's an example of some Perl query code:

#!/usr/bin/perl -w use strict; use DBI; my $dbfile = 'C:/db/employee.db'; my $dbh = DBI->connect( "dbi:SQLite:dbname=$dbfile", "", "", { RaiseError => 1, AutoCommit => 0, } ); my $sql = q{ select e.* from employees e, groups g where e.sam_account_name = g.sam_account_name and g.group_name = 'Group Name' order by last_name, first_name }; foreach my $emp (@{$dbh->selectall_arrayref($sql, {Slice => {}})}) { printf( "\t\t%-15s %-15s: (%s)\n", $emp->{first_name}, $emp->{last_name}, $emp->{sam_account_name}, ); } $dbh->rollback;

The query in $sql is copied and pasted into the SQLiteStudio editor. Regardless of the group name I choose, the editor returns rows in approximately 0.001 seconds. Perl takes several seconds may return no rows, even where there are matching candidates. If I change the group name, it may return the same rows as the editor but it can take 10+ seconds. The result set is always small 2 - 20 rows depending on the group. The database is 520MB.

Note the code above was written to simplify my problem. The actual code has the group name as a placeholder, and I simply fetch each row rather than returning everything into an array as I do above. Regardless, the results are the same.

This is running on Windows 7. I was using DBD::SQLite 1.54 - As a test this morning I upgraded to 1.55_04 to see if there were any fixes in the developer version. I recreated the database, but the results are still the same

Replies are listed 'Best First'.
Re: DBD::Sqlite queries slow - and gives wrong results
by NetWallah (Canon) on Dec 11, 2017 at 21:47 UTC
    Assuming you have fewer GROUPS than EMPLOYEES, and/or the GROUPS table is indexed by NAME, you could get a slightly faster query by reversing the order of the tables:
    select e.* from groups g, employees e ....
    FYI - this recommendation is confirmed from https://sql-tuning.com/

                    All power corrupts, but we need electricity.

      Thanks, but I run exactly the same query against exactly the same database from SQLiteStudo, and the time goes from many seconds to milliseconds. Also, that doesn't explain why the results are different between Perl and SQL
Re: DBD::Sqlite queries slow - and gives wrong results
by erix (Parson) on Dec 12, 2017 at 05:45 UTC

    update: See last lines

    I tried to repeat this performance by making an SQLite database of those two tables + indexes, together 600 MB, employees with 18M rows, groups with just 3 rows.

    "select * from employees where sam_account_name = '<some unique value>'" takes 3 seconds. Your query takes even longer.

    So, it seems your results are normal. SQLite is not a particularly fast database. (Then again, I know next to nothing of SQLite; it's always slow-ish in my hands so maybe I do something wrong.)

    FWIW, postgres runs this same query in less than a millisecond, with the same data.

    Could sqllitestudio be caching the query, or the data? It seems likely; sqllitestudio is just a wrapper, from what I read on its website.

    How fast is a SQLite commandline query? Maybe you can try something like:

    time sqlite3 ~/employee.db "select e.* from employees e, groups g where e.sam_account_name = g.sam_account_name and g.group_name = 'whatever' order by e.last_name, e.first_name"

    (Sorry, I don't know the windows invocation.)

    update: Mea culpa: I botched the indexing (didn't commit..), and with the indexes fixed it worked all right: fast, and just as fast as postgres. So I take all the above back.

    update 2: now ran it on windows 7 too and it was PDQ there as well.

Re: DBD::Sqlite queries slow - and gives wrong results
by Discipulus (Abbot) on Dec 11, 2017 at 20:20 UTC
    Hello astroboy,

    why do you  $dbh->rollback; ?

    L*

    PS then I suppose you need transactions and commit when you've done.

    PPS have you checked for database and table fragmentation? To compact the db with VACUUM can help in such cases.

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: DBD::Sqlite queries slow - and gives wrong results
by rminner (Chaplain) on Dec 14, 2017 at 06:53 UTC

    I do not have the answer for you, but if i had your problem, then i would try to identify the cause through benchmarking the code with Devel::NYTProf. Seeing which line of code consumes most time, might help you in identifying the root cause, and as such fixing it.

    Some random thought based on your description (might not apply):

    "Perl takes several seconds may return no rows, even where there are matching candidates."
    From the behaviour this makes me think of a locking/timeout issue. E.g. the file is locked through another process, and your call times out, while trying to acquire a lock. Googling i stumbled across http://blogs.perl.org/users/timm_murray/2013/02/sqlite-and-writes.html which mentions locking in combination with sqlite_busy_timeout. One thing contradicting this theory however is, that you use
    RaiseError => 1
    , and as such you should get an exception when you fail to acquire the lock. An empty result, where there should be a result, is however many times the symptom of an unhandled exception or timeout.

    Update: Googling further i also stumbled over the following article http://beets.io/blog/sqlite-nightmare.html which mentions that in certain circumstances sqlite might sleep for whole seconds instead of microseconds when trying to acquire a lock. If you have concurrent processes/threads accessing the same sqlite file, then this might be another possible theory. Can you reproduce your issue with a copy of the sqlite database, with no other process accessing the file?
Re: DBD::Sqlite queries slow - and gives wrong results
by vagabonding electron (Deacon) on Dec 17, 2017 at 15:02 UTC

    Just my two cents - this is rather a question than an answer :-)

    I played with this query using the list of the highly esteemed monks, inserting each line 10_000 times in the sqlite table employees and using the distinct values of the column "Level" for the table groups. The database is 626 MB. (Update: created index on Level in both tables, without index the script below is slow. ) The query below was executed by SQLiteStudio 3.1.1 on Windows 10 (32 bit) in 0.174 - 0.180 sec. It does show the list as output (1000 per page), however since there are 40_000 lines in it, it takes extra time wenn you switch to the next page of the output. If you change the output preferences to put 40_000 lines on the same page it takes 0.231 sec then. The following script:

    #!/perl use strict; use warnings FATAL => qw(all); use DBI; use File::Spec; use Time::HiRes qw (time); my $MEMORY = 1; my $start = time; my $PATH = "C:/TMP/___TMP/_TRASH"; my $dbfile = File::Spec->catdir($PATH, 'Big_DB.db'); my $dbh; if (1 == $MEMORY) { $dbh = DBI->connect('dbi:SQLite:dbname=:memory:',"","",{RaiseError + => 1}) or die "Couldn't connect to database: " . DBI->errstr; } else { $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","",{RaiseError +=> 1}) or die "Couldn't connect to database: " . DBI->errstr; } print "Time till connect: ", time() - $start, $/; $start = time; if (1 == $MEMORY) { $dbh->sqlite_backup_from_file($dbfile); } print "Time till load: ", time() - $start, $/; $start = time; # Marker. my $sth = $dbh->prepare("select Name, Writeups from employees e, group +s g where e.Level = g.Level and g.Level = 'Pope (28)' order by e.Writ +eups desc "); print "Time till prepare: ", time() - $start, $/; $start = time; $sth->execute(); print "Time till execute: ", time() - $start, $/; $start = time; my $answer = 0; while ( my ($name, $writeups) = $sth->fetchrow_array ) { $answer += 1; # print join(';', ($name, $writeups)), $/; } print "Answer: $answer\n"; print "Time after loop: ", time() - $start, $/; $start = time; $dbh->disconnect;

    ...produced the following output (run many times, the output was similar):

    Time till connect: 0.0150408744812012 Time till load: 4.3910698890686 Time till prepare: 0.000697851181030273 Time till execute: 0.110021829605103 Answer: 40000 Time after loop: 0.16477108001709

    What I noticed then is that you use selectall_arrayref in the original script. If I change the above script after # Marker to:

    my $sql = "select Name, Writeups from employees e, groups g where e.Le +vel = g.Level and g.Level = 'Pope (28)' order by e.Writeups desc "; my $answer = 0; for my $emp (@{$dbh->selectall_arrayref($sql, {Slice => {}})}) { $answer += 1; } print "Answer: $answer\n"; print "Time after loop: ", time() - $start, $/; $start = time; $dbh->disconnect;

    ... then the output was as follows (again, run many times with similar output):

    Time till connect: 0.0147781372070313 Time till load: 4.42248010635376 Answer: 40000 Time after loop: 0.415067911148071

    The time needed is of the same order of magnitude though SQLiteStudio does seem to be a bit faster. Now the next question: 0.11 + 0.16 sec is somehow shorter than 0.41 sec. I tried to benchmark this.

    #!/perl use strict; use warnings FATAL => qw(all); use DBI; use File::Spec; use Benchmark qw(:all) ; my $PATH = "C:/TMP/___TMP/_TRASH"; my $dbfile = File::Spec->catdir($PATH, 'Big_DB.db'); my $MEMORY = 1; my $dbh; if (1 == $MEMORY) { $dbh = DBI->connect('dbi:SQLite:dbname=:memory:',"","",{RaiseError + => 1}) or die "Couldn't connect to database: " . DBI->errstr; } else { $dbh = DBI->connect("dbi:SQLite:dbname=$dbfile","","",{RaiseError +=> 1}) or die "Couldn't connect to database: " . DBI->errstr; } if (1 == $MEMORY) { $dbh->sqlite_backup_from_file($dbfile); } cmpthese(-3, { 'PrepExec' => sub { my $sth = $dbh->prepare("select Name, Writeups from employees +e, groups g where e.Level = g.Level and g.Level = 'Pope (28)' order b +y e.Writeups desc "); $sth->execute(); my $answer = 0; while ( my ($name, $writeups) = $sth->fetchrow_array ) { $answer += 1; # print join(';', ($name, $writeups)), $/; } }, 'selectall' => sub { my $sql = "select Name, Writeups from employees e, groups g wh +ere e.Level = g.Level and g.Level = 'Pope (28)' order by e.Writeups d +esc "; my $answer = 0; for my $emp (@{$dbh->selectall_arrayref($sql, {Slice => {}})}) + { $answer += 1; } }, }); $dbh->disconnect;

    ... and the output is

    Rate selectall PrepExec selectall 2.88/s -- -26% PrepExec 3.90/s 35% --

    for the in-memory database and

    Rate selectall PrepExec selectall 2.69/s -- -25% PrepExec 3.57/s 33% --

    for the on-disk database. Is it just me or is it a reasonable assumption that the prepare - execute sequence can be faster that selectall_arrayref in some cases? Btw. I did not see a notable difference in quering the database from memory vs. from disk in my tests. I did not get false results in these tests comparing the output of SQLiteStudio and Perl either.

    Thanks for reading this.

    Disclaimer: I am an amateur, not a professional programmer (though it does look anyway :-) ).

Re: DBD::Sqlite queries slow - and gives wrong results
by Anonymous Monk on Dec 11, 2017 at 23:24 UTC
    have you tried fetchrow_hashref instead of selectall_arrayref?
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1205302]
Approved by Discipulus
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2020-08-06 12:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which rocket would you take to Mars?










    Results (39 votes). Check out past polls.

    Notices?