Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

dynamic bulk insert in sqlite

by sroux (Acolyte)
on Sep 22, 2012 at 08:02 UTC ( #995075=perlquestion: print w/replies, xml ) Need Help??
sroux has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks, this is a question about sqlite and/or perl.

Indeed I'd like to bulk insert a delimited text into a table, but I'd like doing it dynamically just specifying a table name. I want the fields to be created dynamically according to the number of fields in the text file.

I checked the sqlite import command but it requires a table with the right amount of fields. Also check the DBD and DBI module but could not find anything related. Now is my question related to what we call virtual tables?

Many thanks for your guidance. sebroux

Replies are listed 'Best First'.
Re: dynamic bulk insert in sqlite
by erix (Parson) on Sep 22, 2012 at 08:29 UTC

    Assuming there is a header line: just chop off the first line (the header line) and split that into an array. With that array, build a create table statement (it's often necessary to sanitize the column names), and send it to SQLite. After that, just send the whole file *except* the first line.

    If there is no headerline, generating the column-names + create table statement is obviously even easier.

    (The whole procedure is not SQLite specific either, the bulk loader used will differ per DBMS)

Re: dynamic bulk insert in sqlite
by Anonymous Monk on Sep 22, 2012 at 08:30 UTC
Re: dynamic bulk insert in sqlite
by Marshall (Abbot) on Sep 23, 2012 at 00:46 UTC
    I actually haven't used the bulk import to SQLite - I actually don't even have the command line installed on my machine! I don't need it! I just installed the Perl DBI::SQLite module and that was it.

    There is a firefox add-on SQLite Manger and it easier to use than the command line. I highly recommend this gizmo! I am not a super experienced SQL guy, but I do have a MySQL server and its command line utils on my machine. SQLite is way easier to use (no admin hassle) and I've found this graphical firefox add-on to be very helpful for viewing an existing database.

    I "hand code" the table creation and inserts because it is easy (a page+ of code), runs very fast (same as fastest bulk insert utility) and I'm often fiddling the column names and I make decisions about which columns to index - that's important. Some of my DB's have literally millions of rows.

    I've only made about a dozen DB's, so I am inexperienced, but I have learned few things...

    First tip is that when I create the table, I use lower case and "_" instead of spaces in the names. There is a hash variant in the DBI that lowercases the column names that it returns. If everybody is lower case to begin with this makes it easier to mess with later (gurus corrections to this observation are welcome- but that appears to be the case from what I've seen so far). A space is allowed in a hash key, but again eliminating that simplifies things later in actual use.

    I guess you will start with perhaps some CSV file? The first line is typically the column names. I adjust them as explained above. In the CREATE TABLE statement, you have choices about type (VARCHAR, INTEGER, etc) as well as DEFAULT and Null or not allowed. This is application specific and you will have to decide what you want to do. These choices affect future table inserts.

    There are 2 main things that affect the speed of the inserts. The first of these is:

    $dbh->do('PRAGMA synchronous = 0'); # Non transaction safe!!!
    If this is a new DB creation, this is perfectly fine. This causes the next write operation to proceed without confirmation that the previous one succeeded. A typical speed increase might be from 45 seconds to 30 seconds based upon this. If you are adding to an existing DB, then this is not a good idea because it is not transaction safe.

    By far and away the LARGEST speed increase is achieved my making the entire import a single transaction!

    $dbh->do("BEGIN"); import_data($dbh, $data_dir); $dbh->do("COMMIT");
    I have some actual benchmarks on Monks, but wasn't able to find them easily with my first search, but basically this can be the difference between a couple of minutes and a couple of days!. This is the single most important thing to do!

    SQlite's cache size can be dynamically varied as the DB runs. When doing inserts, this won't make much difference. This will make a large difference when indexing the tables.

    $dbh->do('PRAGMA cache_size = 200000'); # 200 MB dynamic increase
    I think the default is like 20 MB, increasing this limit will dramatically affect index creation on large tables! In SQLite, you can actually move this param up and down dynamically. But for a bulk import program, just set it to 200 MB or 400 MB and don't worry about it. A time difference might be 15 seconds instead of 3 minutes. Worth doing.

    It is actually possible to "over index" the tables. Typically a lot of the columns will not need an index. The query optimizer can become "confused" if you give it too many options and this slows down queries. So "more" is not necessarily "better". What to index or not is application specific and I can't guide you further on that without more information.

    Oh, if this is a new table. Create the indicies after you create the table! This will also have a huge impact on insert performance!

    Update: I found my post with some benchmarks: Re^3: Efficient way to handle huge number of records?. Look at the links that this refers to for some example code. This DB has more than a million rows.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://995075]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2018-01-16 21:37 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (192 votes). Check out past polls.