Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: SQLite Slow insert

by Anonymous Monk
on Feb 17, 2013 at 23:19 UTC ( #1019233=note: print w/replies, xml ) Need Help??

in reply to SQLite Slow insert

Replies are listed 'Best First'.
Re^2: SQLite Slow insert
by afoken (Monsignor) on Feb 18, 2013 at 06:01 UTC

    Mostly right, but don't use DBI's quote() method. Use placeholders. The quote() method should be considered private to DBI and DBD::*, because it is not needed outside DBI and DBD::*. See Re^5: Variable interpolation in a file to be read in.

    Using placeholders allows DBI, DBD::whatever, and the database behind DBD::whatever to cache the SQL statement passed to prepare (and prepare_cached) and especially its parsed form. Interpreting SQL with placeholders is needed exactly once, no matter how often you execute the SQL command. Interpreting SQL polluted with actual values is required every time you execute that SQL command. So for one million inserts, you can avoid 999999 runs of the SQL interpreter by using the results from the first interpreter run by using placeholders. Simply not running complex code in 99.9999% of all cases makes your program run faster.


    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      Thank you for your replay guys!!! I think the issue is trying to check if data exists first before inserting, so it seems I have to do one row of data at a time, unless there is a way to test if data row exists then how I was doing it. I am reading all of the suggested links everyone has passed along, This is my first crack on using a DB like this.

      my $sth = $dbh->prepare("INSERT INTO HLDdata (Data) VALUES (?)"); foreach $txt (@TXT) { print "$txt\n"; open(IN,"$txt") || warn("cant open $txt"); while(<IN>){ $line = $_; my $sth = $dbh->prepare( "select count(*) from HLDdata where Data += '$line'"); $sth->execute(); my ($data) = $sth->fetchrow(); print "$data \n"; if ($data ne 0){ print "record exist not adding\n"; &Logit("record exist not adding"); }else{ my ($Data) = $line; $sth->execute(); $line =~s/\|/\t/g; # print "$line\n"; print OUT "$line"; } }

        There were a few issues in your code. I fixed them below.

        my $sth = $dbh->prepare("INSERT INTO HLDdata (Data) VALUES (?)"); my $count = $dbh->prepare( "select count(*) from HLDdata where Data = +?"); foreach $txt (@TXT) { print "$txt\n"; open(IN,"$txt") || warn("cant open $txt"); while(<IN>){ $line = $_; $count->execute($line); my ($data) = $count->fetchrow(); print "$data \n"; if ($data != 0){ print "record exist not adding\n"; Logit("record exist not adding"); next; } my ($Data) = $line; $sth->execute($Data); $line =~s/\|/\t/g; # print "$line\n"; print OUT $line; } }

        Yes, you're probably correct that the checking is taking the most of the time. This is because you're missing an index on HLDdata (data) -- sqlite has to scan the whole table every time you check for a row's existence. You can create the missing index with CREATE UNIQUE INDEX HLDdata_uniq ON HLDdata (Data);

        There's a trade-off: you'll now use double the disk space. If $Data tends to be big (above a hundred bytes or so), you are better off storing a hash (sha1, md5, or similar) of $Data in the table in a second column and checking for its existence instead. (SQLite does not appear to have support for functional indices or a hashing function, so you'll have to do it on the Perl side.)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1019233]
and snow settles gently...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2017-02-27 23:12 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (393 votes). Check out past polls.