Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re^3: Efficient search through a huge dataset

by Caron (Friar)
on Oct 20, 2004 at 08:12 UTC ( #400771=note: print w/replies, xml ) Need Help??

in reply to Re^2: Efficient search through a huge dataset
in thread Efficient search through a huge dataset

it took SQLite 310 seconds to insert

The SQLite docs say what you should do to improve speed: group instructions into transactions and eventually modify the "syncronous" pragma.

If you add $dbh->do('begin'); before your loop and $dbh->do('commit'); at the end, the insertion will take less than one second, as it should be.

use Time::HiRes qw(gettimeofday tv_interval); my $start_time = [gettimeofday]; $dbh->do('begin'); for my $i (1..1000) { $st1->execute($i, $i * 2); if ($i % 2) { $st2->execute($i, $i * 3); } } $dbh->do('commit'); my $elapsed = tv_interval ( $start_time, [gettimeofday]); print "creation time: $elapsed\n"; __END__ insertion time: 0.263462

Replies are listed 'Best First'.
Re^4: Efficient search through a huge dataset
by pg (Canon) on Oct 20, 2004 at 08:22 UTC

    This comparison is not that fair. ou will be right if those inserts indeed can be wrapped in one transaction.

    Think about it, what if those are the transaction records for different online consumers, you have to commit after each insert. For that situation, SQLite will be quite slow. Well, I think it all depends on the application.


    I can accept merlyn's tradeoff argument below.

      True, it's not fair. But then, if you want to be fair, you also have to include the "overhead" of the rest of the ODBC database... namely that you have to use WIndows. {grin}

      SQLite is small, simple to install, and does the job quickly. Yes, it sucks at transaction start/end, because it's very simple... it's just locking the file as its communication system. But in nearly every other way, it fits this problem space quite nicely.

      Life is a series of tradeoffs.

      -- Randal L. Schwartz, Perl hacker
      Be sure to read my standard disclaimer if this is a reply.

      You don't have to group EVERYTHING. Grouping inserts into chunks of reasonable size make them faster, even if you have to insert one million records.

      Please refer to the SQL docs.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://400771]
[Eily]: duck typing is when someone starts typing in python you have to duck, fast
[shmem]: trading perl for python is just like throwing away a big tool box, to be given a hammer, a gripper and a screwdriver instead
[choroba]: but they're shiny!

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (9)
As of 2018-04-20 13:13 GMT
Find Nodes?
    Voting Booth?