Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

DBI: speed up insertions in generating database

by punch_card_don (Curate)
on Dec 04, 2004 at 13:49 UTC ( [id://412401]=perlquestion: print w/replies, xml ) Need Help??

punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:

Malkovitchian Monks, I am populating a mysql databse programmatically, analyzing a giant text file and then inserting into one of the many tables.
pseudo code: open FILE while ($line = FILE) { analyze $line to extract ($table, $value) pairs foreach ($table_$value_pair) { $sql = "INSERT INTO ".$table." VALUES(".$value.")" prepare execute() } }
This works, but is very slow. I can't get around the repeated prepares because the table changes with each iteration and there is way too much data (~1Gb) to hold it all in table-keyed hashes ready for a table-wise insertion at the end.

One option I'm considering is doing it in blocks, storing for a while, then purging:

open FILE $index = 0; while ($line = FILE) { analyze $line to extract ($table, $value) pairs push each found $value onto $values_hash{$table} as 2-d array so +$values_hash{$table} = a list of $values $index++; when $index reaches, say, 10,000 or 100,000 { foreach $key (keys $values_hash) { $sql = "INSERT INTO ".$table." VALUES(?)" prepare foreach $value in list $values_hash{$key} execute($value) } } clear $values_hash ready to start storing again } }
I figure this should help, but not sure how much I will actually gain by that.

Are the are any other slick speed tricks? I vaguley remember something to do with pre-generating csv files and then importing....

Replies are listed 'Best First'.
Re: DBI: speed up insertions in generating database
by gmax (Abbot) on Dec 04, 2004 at 14:34 UTC

    You may use the INSERT extended syntax for inserting several rows at once. In addition, if your table has several indexes (besides the primary key) you may use the (MySQL specific) command "DISABLE/ENABLE KEYS".

    Here is a framework for your insertion.

    $dbh->do(qq{ALTER TABLE $table DISABLE KEYS}); my $base_query = qq{INSERT INTO $table VALUES }; my $query = $base_query; my $count = 0; my $max_rows = 10000; # adjust it to your needs while ($line = <FILE>) { my @values = something_from($line); $query .= "," if $count++; $query .= "(" . join(",", map { $dbh->quote($_) } @values ) . ")"; if ($count > $max_rows) { $dbh->do($query) or die "something wrong ($DBI::errstr)"; $query = $base_query; $count = 0; } } $dbh->do($query) if $count; $dbh->do(qq{ALTER TABLE $table ENABLE KEYS});

    See the manual for INSERT and for ALTER TABLE, and Speeding up the DBI for some more on related subjects.

     _  _ _  _  
    (_|| | |(_|><
     _|   
    
Re: DBI: speed up insertions in generating database
by VSarkiss (Monsignor) on Dec 04, 2004 at 15:04 UTC

    One thing that may help is to prepare all your insert statements beforehand. Something like this:

    my %prepares = ( table1 => $dbh->prepare("insert table1 (...) values (?, ...)", table2 => $dbh->prepare("insert table2 (...) values (?, ...)", # and so on ); # Now process the file open FILE, '<', $wherever; while (<FILE>) { # set $table and @values from $_ $prepares{$table}->execute(@values); }
    This may or may not help, depending on lots and lots of factors.

    As for using CSV files, you're probably thinking of using mysql loader. That may be a win, again depending on lots of factors. Essentially, instead of executing the insert statement in your program, you'd write the data to one of several files (one for each table), then use loader to populate one table at a time.

    It's very hard to say whether any of these will be a win. The factors to consider include the amount of data, the number of tables, how they're indexed, whether you're doing this just once or on a regular basis, and so on.

Re: DBI: speed up insertions in generating database
by eric256 (Parson) on Dec 04, 2004 at 15:59 UTC

    In a similar situation I have done INSERT DELAYED + running things in batches of 100. Just remember to catch the last batch if your data isn't divisible by 100. The DELAYED part of insert will tell MySQL to do that insert when it has a moment. The upside is your program gets control back immediatly, the downside is you loose error reporting on your inserts.

    Another option might be to create an intermidiate table that holds which tables the data should go in, so you import to this table in large batches. Then go through that table moving them to the correct locations. I'm not sure that will get you much in the way of gain since you will then have to do 2 inserts, but it would allow you to do both operations in large batches.

    A final posibility that can be added to either of the above is to prepare a single SQL statment for each table at the beginning, then use then use those prepares to insert the data.

    $prepares->{$table}->execute(@data);

    ___________
    Eric Hodges
Re: DBI: speed up insertions in generating database
by zakzebrowski (Curate) on Dec 04, 2004 at 19:33 UTC
    See High Performance MySQL book. It list many ways to improve performance. However, one trick I recently learned was to drop all indexes before you do the insert. (Though, then you need to re-generate the index after inserting so depending on the amount of new inserts you do that may be counter productive.) Also, do a google search on mysql for performance tricks. Alternatively, consider using SQLite, if it meets your needs. I was able to use that for a major project with no serrious issues. (I did not need full text search / ability to connect to the database from the network...)
    Cheers.


    ----
    Zak - the office
csv way (as request ;-)
by Luca Benini (Scribe) on Dec 04, 2004 at 17:04 UTC
    If you perl to parse data (of course if you prefer/need divide it) until you obtain a unique csv file and after (assuming file is mydata.csv)
    LOAD DATA LOCAL INFILE 'mydata.csv' INTO TABLE mytable FIELDS TERMINATED BY ',' ENCLOSED BY '"' (list, of , your , table, field);
Re: DBI: speed up insertions in generating database
by punch_card_don (Curate) on Dec 04, 2004 at 16:11 UTC
    Thanks all for the suggestsions - common thread = do all theinserts for onew table at a time.

    So, I've re-written the parsing code to store each table's values in a separate text file temporarily. Then I'l run a separate importation routine that imports table by table.

    The new text-file-storage parser is running right now at about 2,400 origin file lines per hour (there hundreds of values to extract from each line after comparison with criteria in another file - very convoluted), whereas the direct db-storage program was running at ~300 origin-file-lines per hour.

      Yep defo write out a csv file and then use mysqlimport or a 'Load data infile' statement for each table. If you do stick with DBI Inserts maybe try the inserts with multiple VALUES lists

      INSERT INTO $table (field1,field2) VALUES (value1,value2),(values3,val +ues4);
      From the mysql docs..

      When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using a lot of INSERT statements. See section 13.1.5 LOAD DATA INFILE Syntax.

      mysqlmonks.org
Re: DBI: speed up insertions in generating database
by erix (Prior) on Dec 04, 2004 at 22:25 UTC

    Make sure to set autocommit off (MySQL's default is on, I believe), and commit not too often (every few hundred rows or so?).

      begin-commit do the magic begin; <some sql code> commit;
Re: DBI: speed up insertions in generating database
by mkirank (Chaplain) on Dec 06, 2004 at 06:01 UTC
    I had a similar situation
    Link ,
    I also had to Insert into multiple Databases, Like some people have suggested

    1. Bulk insert is fastest , write to a csv file and then do a load data infile to import the data
    2. Use sqllite it is faster compared to mysql.
    3 I am not sure but u can try this , You can Insert Into sqllite and then export to cvs and Import to Mysql (like i said im not sure about this but just a suggestion).
    4. If it is speed you require , Write a C program.
Re: DBI: speed up insertions in generating database
by Thilosophy (Curate) on Dec 06, 2004 at 03:17 UTC
    Do not use DBI and Perl to do this. Use MySQL's import utility to load the the file. That will be A LOT faster than anything you can do with normal SQL.

    If the file structure is too complex for the import tool to handle, use Perl first to create one or more CSV files that are simple enough.

Re: DBI: speed up insertions in generating database
by Jeppe (Monk) on Dec 05, 2004 at 18:26 UTC
    Hah. This is simple - use a hash for storing prepared statements. Untested pseudocode:
    while (my $line = FILE) { analyze $line to extract ($table, $value) pairs foreach ($table, $value) pair { my $sql_statement = "INSERT INTO $table VALUES(?)"; my $sth = store_or_retrieve($sql_statement); $sth->execute($value); } } my $sth_hashref; sub store_or_retrieve { my $sql_statement = shift; if (exists $sth_hashref->($sql_statement)) { return $sth_hashref->{sql_statement; } else { my $sth = $dbh->prepare($sql_statement); $sth_hashref->{$sql_statement} = $sth; return $sth; } }
    However, I would suggest you probably have more work to do regarding your database structure - do you really only have one row in each table?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://412401]
Approved by Joost
Front-paged by dbwiz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-04-23 06:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found