Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Loading bulk data into SQLite

by hangon (Deacon)
on Nov 29, 2008 at 16:50 UTC ( #726809=perlmeditation: print w/ replies, xml ) Need Help??

In attempting to load a log file into SQLite, it took 20 minutes of disk thrashing. Super search mostly pointed to the fact that databases typically include a loading utility, but it appears that SQLite doesn't have one. RTFM didn't help much, and I was about to post the speed question in SOPW when I found the answer on SQLite's benchmark page. The trick is to do it all in one transaction. The speed improvement was over 200x for loading 50k rows in my test case. Here it is in case someone else needs to know:

# first attempt ... my $start = time; my $dbh = DBI->connect("DBI:SQLite:$dbfile") or die; my $sth = $dbh->prepare( qq(INSERT INTO 'logentries' ('col1', 'col2', 'col3', 'col4') values (?,?,?,?) )); $sth->execute($_->[0], $_->[1], $_->[2], $_->[3]) for @rows; print "et: ", time - $start, " sec\n"; # et: 1082 seconds
# single transaction ... my $start = time; my $dbh = DBI->connect("DBI:SQLite:$dbfile") or die; $dbh->do('BEGIN'); my $sth = $dbh->prepare( qq(INSERT INTO 'logentries' ('col1', 'col2', 'col3', 'col4') values (?,?,?,?) )); $sth->execute($_->[0], $_->[1], $_->[2], $_->[3]) for @rows; $dbh->do('COMMIT'); print "et: ", time - $start, " sec\n"; # et: 5 seconds

Update:
Setting AutoCommit => 0 as suggested by Christoforo gives the same speed improvement. As I suspected there are limits on transaction size as noted by Jenda.

Comment on Loading bulk data into SQLite
Select or Download Code
Re: Loading bulk data into SQLite
by Cristoforo (Deacon) on Nov 29, 2008 at 17:28 UTC
    I think you could see an improvement, (I wasn't aware of the method you provided above), by setting AutoCommit to '0'. For example,

    my $dbh = DBI->connect("dbi:SQLite:dbname=pedro.lite","","", {AutoCommit => 0}) or die "Can't connect";
    Then at the end (or after 'x' amount of inserts), commit your inserts.

    $dbh->commit() or die $dbh->errstr;

    Chris

Re: Loading bulk data into SQLite
by Jenda (Abbot) on Nov 29, 2008 at 18:23 UTC

    The trick is to do things like this in less bigger transactions. One huge may be too much. Especially if the logfile you want to import gets bigger. I'm surprised you did not find any references to this, batching inserts is an "old vest". How big to make the transactions is something you usually have to find out by testing.

Re: Loading bulk data into SQLite
by Limbic~Region (Chancellor) on Nov 29, 2008 at 23:57 UTC
    hangon,
    If you are doing bulk loading from a text file, it is a bit complicated. SQLite 2.x used to have the COPY command - you can see where I used it in Fastest Rising Monks - Revisited (old ugly code - but look at Build_DB()). It disappeared in 3.x but there are a number of ways to accomplish the same thing. See this for instance (search for .import). You will need to do this through the SQLite shell. If memory serves, I once did this using DBI::Shell.

    Cheers - L~R

      Thanks L~R. For some reason I was fixated on "loading" and missed the ".import" command. I installed a standalone version of SQLite to try out its command shell. Still had to process the file through Perl to get it into a suitable format, but afterwards loading it using .import was extremely fast.

        hangon,
        Still had to process the file through Perl to get it into a suitable format

        I forgot to mention that the .mode command can switch to various different formats such as tabs a la

        .mode tabs; .import <file> <table>;

        Cheers - L~R

Re: Loading bulk data into SQLite
by Anonymous Monk on Nov 30, 2008 at 02:21 UTC
    RTFM didn't help much, and I was about to post the speed question in SOPW when I found the answer on SQLite's benchmark page.
    It is like DBD::mysql is not a substitute for http://mysql.com. The web is part of the manual.
Re: Loading bulk data into SQLite
by andreas1234567 (Vicar) on Dec 01, 2008 at 12:35 UTC
    You should also consider adding an index and see how that affects your performance. This simple benchmark indicates that
    • Turning off autocommit roughly doubles performance.
    • Turning off autocommit and adding an index roughly triples performance.
    Many factors work together here, YMMV.
    --
    No matter how great and destructive your problems may seem now, remember, you've probably only seen the tip of them. [1]
Re: Loading bulk data into SQLite
by lachoy (Parson) on May 20, 2010 at 15:37 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://726809]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2014-07-12 03:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (238 votes), past polls