Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^2: Efficient search through a huge dataset

by pg (Canon)
on Oct 20, 2004 at 04:44 UTC ( [id://400755]=note: print w/replies, xml ) Need Help??


in reply to •Re: Efficient search through a huge dataset
in thread Efficient search through a huge dataset

DBD::SQLite might be the answer to this particular case, but it is slow when you try to insert into a table with index. (Not sure what happens without index, general speaking insert is slower with index, when select is faster with index. That's OT)

I compared DBD::SQLite, with ODBC, the same table structure and index. insert 1000 rows, and then select. It took ODBC 2 seconds to insert 1000 rows, and 0 (which means less than 1) second to select; but it took SQLite 310 seconds to insert (way to big), adn 0 second to select (which is virtually the same):

use DBI; use Data::Dumper; use strict; use warnings; my $dbh = DBI->connect("dbi:SQLite:dbname=dbfile","",""); #my $dbh = DBI->connect("dbi:ODBC:everything","",""); =document #$dbh->do('create table table1(col1 number(10), col2 number(10))'); $dbh->do('create table table1(col1 int, col2 int)'); $dbh->do('create index index1 on table1(col1)'); #$dbh->do('create table table2(col1 number(10), col2 number(10))'); $dbh->do('create table table2(col1 int, col2 int)'); $dbh->do('create index index2 on table2(col1)'); =cut $dbh->do('delete from table1'); $dbh->do('delete from table2'); my $st1 = $dbh->prepare('insert into table1(col1, col2) values(?, ?)') +; my $st2 = $dbh->prepare('insert into table2(col1, col2) values(?, ?)') +; print time, "\n"; for my $i (1..1000) { $st1->execute($i, $i * 2); if ($i % 2) { $st2->execute($i, $i * 3); } } print time, "\n"; { my $st3 = $dbh->prepare('SELECT t1.col1 FROM table1 t1 LEFT OUTER +JOIN table2 t2 ON (t1.col1 = t2.col1) WHERE t1.col1 IS NOT NULL AND t +2.col1 IS NULL'); $st3->execute(); my $ref3 = $st3->fetchall_arrayref; #print Dumper($ref3); } print time, "\n"; $dbh->disconnect();

Replies are listed 'Best First'.
Re^3: Efficient search through a huge dataset
by Caron (Friar) on Oct 20, 2004 at 08:12 UTC
    it took SQLite 310 seconds to insert

    The SQLite docs say what you should do to improve speed: group instructions into transactions and eventually modify the "syncronous" pragma.

    If you add $dbh->do('begin'); before your loop and $dbh->do('commit'); at the end, the insertion will take less than one second, as it should be.

    use Time::HiRes qw(gettimeofday tv_interval); my $start_time = [gettimeofday]; $dbh->do('begin'); for my $i (1..1000) { $st1->execute($i, $i * 2); if ($i % 2) { $st2->execute($i, $i * 3); } } $dbh->do('commit'); my $elapsed = tv_interval ( $start_time, [gettimeofday]); print "creation time: $elapsed\n"; __END__ insertion time: 0.263462
    A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://400755]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-03-19 08:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found