Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: SQLite Slow insert

by afoken (Parson)
on Feb 18, 2013 at 06:01 UTC ( #1019276=note: print w/ replies, xml ) Need Help??


in reply to Re: SQLite Slow insert
in thread SQLite Slow insert

Mostly right, but don't use DBI's quote() method. Use placeholders. The quote() method should be considered private to DBI and DBD::*, because it is not needed outside DBI and DBD::*. See Re^5: Variable interpolation in a file to be read in.

Using placeholders allows DBI, DBD::whatever, and the database behind DBD::whatever to cache the SQL statement passed to prepare (and prepare_cached) and especially its parsed form. Interpreting SQL with placeholders is needed exactly once, no matter how often you execute the SQL command. Interpreting SQL polluted with actual values is required every time you execute that SQL command. So for one million inserts, you can avoid 999999 runs of the SQL interpreter by using the results from the first interpreter run by using placeholders. Simply not running complex code in 99.9999% of all cases makes your program run faster.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)


Comment on Re^2: SQLite Slow insert
Select or Download Code
Re^3: SQLite Slow insert
by mmittiga17 (Scribe) on Feb 18, 2013 at 17:43 UTC

    Thank you for your replay guys!!! I think the issue is trying to check if data exists first before inserting, so it seems I have to do one row of data at a time, unless there is a way to test if data row exists then how I was doing it. I am reading all of the suggested links everyone has passed along, This is my first crack on using a DB like this.

    my $sth = $dbh->prepare("INSERT INTO HLDdata (Data) VALUES (?)"); foreach $txt (@TXT) { print "$txt\n"; open(IN,"$txt") || warn("cant open $txt"); while(<IN>){ $line = $_; my $sth = $dbh->prepare( "select count(*) from HLDdata where Data += '$line'"); $sth->execute(); my ($data) = $sth->fetchrow(); print "$data \n"; if ($data ne 0){ print "record exist not adding\n"; &Logit("record exist not adding"); }else{ my ($Data) = $line; $sth->execute(); $line =~s/\|/\t/g; # print "$line\n"; print OUT "$line"; } }

      There were a few issues in your code. I fixed them below.

      my $sth = $dbh->prepare("INSERT INTO HLDdata (Data) VALUES (?)"); my $count = $dbh->prepare( "select count(*) from HLDdata where Data = +?"); foreach $txt (@TXT) { print "$txt\n"; open(IN,"$txt") || warn("cant open $txt"); while(<IN>){ $line = $_; $count->execute($line); my ($data) = $count->fetchrow(); print "$data \n"; if ($data != 0){ print "record exist not adding\n"; Logit("record exist not adding"); next; } my ($Data) = $line; $sth->execute($Data); $line =~s/\|/\t/g; # print "$line\n"; print OUT $line; } }

      Yes, you're probably correct that the checking is taking the most of the time. This is because you're missing an index on HLDdata (data) -- sqlite has to scan the whole table every time you check for a row's existence. You can create the missing index with CREATE UNIQUE INDEX HLDdata_uniq ON HLDdata (Data);

      There's a trade-off: you'll now use double the disk space. If $Data tends to be big (above a hundred bytes or so), you are better off storing a hash (sha1, md5, or similar) of $Data in the table in a second column and checking for its existence instead. (SQLite does not appear to have support for functional indices or a hashing function, so you'll have to do it on the Perl side.)

        Thank you for your help, it got me sorted. I appreciate the help and have learned from it. Cheers!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1019276]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2014-08-21 04:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (127 votes), past polls