Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^2: Efficient search through a huge dataset

by pg (Canon)
on Oct 20, 2004 at 04:44 UTC ( #400755=note: print w/ replies, xml ) Need Help??


in reply to •Re: Efficient search through a huge dataset
in thread Efficient search through a huge dataset

DBD::SQLite might be the answer to this particular case, but it is slow when you try to insert into a table with index. (Not sure what happens without index, general speaking insert is slower with index, when select is faster with index. That's OT)

I compared DBD::SQLite, with ODBC, the same table structure and index. insert 1000 rows, and then select. It took ODBC 2 seconds to insert 1000 rows, and 0 (which means less than 1) second to select; but it took SQLite 310 seconds to insert (way to big), adn 0 second to select (which is virtually the same):

use DBI; use Data::Dumper; use strict; use warnings; my $dbh = DBI->connect("dbi:SQLite:dbname=dbfile","",""); #my $dbh = DBI->connect("dbi:ODBC:everything","",""); =document #$dbh->do('create table table1(col1 number(10), col2 number(10))'); $dbh->do('create table table1(col1 int, col2 int)'); $dbh->do('create index index1 on table1(col1)'); #$dbh->do('create table table2(col1 number(10), col2 number(10))'); $dbh->do('create table table2(col1 int, col2 int)'); $dbh->do('create index index2 on table2(col1)'); =cut $dbh->do('delete from table1'); $dbh->do('delete from table2'); my $st1 = $dbh->prepare('insert into table1(col1, col2) values(?, ?)') +; my $st2 = $dbh->prepare('insert into table2(col1, col2) values(?, ?)') +; print time, "\n"; for my $i (1..1000) { $st1->execute($i, $i * 2); if ($i % 2) { $st2->execute($i, $i * 3); } } print time, "\n"; { my $st3 = $dbh->prepare('SELECT t1.col1 FROM table1 t1 LEFT OUTER +JOIN table2 t2 ON (t1.col1 = t2.col1) WHERE t1.col1 IS NOT NULL AND t +2.col1 IS NULL'); $st3->execute(); my $ref3 = $st3->fetchall_arrayref; #print Dumper($ref3); } print time, "\n"; $dbh->disconnect();


Comment on Re^2: Efficient search through a huge dataset
Download Code
Re^3: Efficient search through a huge dataset
by Caron (Friar) on Oct 20, 2004 at 08:12 UTC
    it took SQLite 310 seconds to insert

    The SQLite docs say what you should do to improve speed: group instructions into transactions and eventually modify the "syncronous" pragma.

    If you add $dbh->do('begin'); before your loop and $dbh->do('commit'); at the end, the insertion will take less than one second, as it should be.

    use Time::HiRes qw(gettimeofday tv_interval); my $start_time = [gettimeofday]; $dbh->do('begin'); for my $i (1..1000) { $st1->execute($i, $i * 2); if ($i % 2) { $st2->execute($i, $i * 3); } } $dbh->do('commit'); my $elapsed = tv_interval ( $start_time, [gettimeofday]); print "creation time: $elapsed\n"; __END__ insertion time: 0.263462

      This comparison is not that fair. ou will be right if those inserts indeed can be wrapped in one transaction.

      Think about it, what if those are the transaction records for different online consumers, you have to commit after each insert. For that situation, SQLite will be quite slow. Well, I think it all depends on the application.

      Update:

      I can accept merlyn's tradeoff argument below.

        You don't have to group EVERYTHING. Grouping inserts into chunks of reasonable size make them faster, even if you have to insert one million records.

        Please refer to the SQL docs.

        True, it's not fair. But then, if you want to be fair, you also have to include the "overhead" of the rest of the ODBC database... namely that you have to use WIndows. {grin}

        SQLite is small, simple to install, and does the job quickly. Yes, it sucks at transaction start/end, because it's very simple... it's just locking the file as its communication system. But in nearly every other way, it fits this problem space quite nicely.

        Life is a series of tradeoffs.

        -- Randal L. Schwartz, Perl hacker
        Be sure to read my standard disclaimer if this is a reply.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://400755]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (11)
As of 2014-12-17 19:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (31 votes), past polls