Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^3: SQLite Slow insert

by mmittiga17 (Scribe)
on Feb 18, 2013 at 17:43 UTC ( [id://1019378]=note: print w/replies, xml ) Need Help??


in reply to Re^2: SQLite Slow insert
in thread SQLite Slow insert

Thank you for your replay guys!!! I think the issue is trying to check if data exists first before inserting, so it seems I have to do one row of data at a time, unless there is a way to test if data row exists then how I was doing it. I am reading all of the suggested links everyone has passed along, This is my first crack on using a DB like this.

my $sth = $dbh->prepare("INSERT INTO HLDdata (Data) VALUES (?)"); foreach $txt (@TXT) { print "$txt\n"; open(IN,"$txt") || warn("cant open $txt"); while(<IN>){ $line = $_; my $sth = $dbh->prepare( "select count(*) from HLDdata where Data += '$line'"); $sth->execute(); my ($data) = $sth->fetchrow(); print "$data \n"; if ($data ne 0){ print "record exist not adding\n"; &Logit("record exist not adding"); }else{ my ($Data) = $line; $sth->execute(); $line =~s/\|/\t/g; # print "$line\n"; print OUT "$line"; } }

Replies are listed 'Best First'.
Re^4: SQLite Slow insert
by Anonymous Monk on Feb 19, 2013 at 10:50 UTC

    There were a few issues in your code. I fixed them below.

    my $sth = $dbh->prepare("INSERT INTO HLDdata (Data) VALUES (?)"); my $count = $dbh->prepare( "select count(*) from HLDdata where Data = +?"); foreach $txt (@TXT) { print "$txt\n"; open(IN,"$txt") || warn("cant open $txt"); while(<IN>){ $line = $_; $count->execute($line); my ($data) = $count->fetchrow(); print "$data \n"; if ($data != 0){ print "record exist not adding\n"; Logit("record exist not adding"); next; } my ($Data) = $line; $sth->execute($Data); $line =~s/\|/\t/g; # print "$line\n"; print OUT $line; } }

    Yes, you're probably correct that the checking is taking the most of the time. This is because you're missing an index on HLDdata (data) -- sqlite has to scan the whole table every time you check for a row's existence. You can create the missing index with CREATE UNIQUE INDEX HLDdata_uniq ON HLDdata (Data);

    There's a trade-off: you'll now use double the disk space. If $Data tends to be big (above a hundred bytes or so), you are better off storing a hash (sha1, md5, or similar) of $Data in the table in a second column and checking for its existence instead. (SQLite does not appear to have support for functional indices or a hashing function, so you'll have to do it on the Perl side.)

      Thank you for your help, it got me sorted. I appreciate the help and have learned from it. Cheers!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1019378]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-19 20:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found