DBI: speed up insertions in generating database

punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: DBI: speed up insertions in generating database by gmax (Abbot) on Dec 04, 2004 at 14:34 UTC
You may use the INSERT extended syntax for inserting several rows at once. In addition, if your table has several indexes (besides the primary key) you may use the (MySQL specific) command "DISABLE/ENABLE KEYS". Here is a framework for your insertion. $dbh->do(qq{ALTER TABLE $table DISABLE KEYS}); my $base_query = qq{INSERT INTO $table VALUES }; my $query = $base_query; my $count = 0; my $max_rows = 10000; # adjust it to your needs while ($line = <FILE>) { my @values = something_from($line); $query .= "," if $count++; $query .= "(" . join(",", map { $dbh->quote($_) } @values ) . ")"; if ($count > $max_rows) { $dbh->do($query) or die "something wrong ($DBI::errstr)"; $query = $base_query; $count = 0; } } $dbh->do($query) if $count; $dbh->do(qq{ALTER TABLE $table ENABLE KEYS}); [download] See the manual for INSERT and for ALTER TABLE, and Speeding up the DBI for some more on related subjects. _ _ _ _ (_\|\| \| \|(_\|>< _\|	[reply] [d/l]
Re: DBI: speed up insertions in generating database by VSarkiss (Monsignor) on Dec 04, 2004 at 15:04 UTC
One thing that may help is to prepare all your insert statements beforehand. Something like this: `my %prepares = ( table1 => $dbh->prepare("insert table1 (...) values (?, ...)", table2 => $dbh->prepare("insert table2 (...) values (?, ...)", # and so on ); # Now process the file open FILE, '<', $wherever; while (<FILE>) { # set $table and @values from $_ $prepares{$table}->execute(@values); }` [download] This may or may not help, depending on lots and lots of factors. As for using CSV files, you're probably thinking of using mysql loader. That may be a win, again depending on lots of factors. Essentially, instead of executing the insert statement in your program, you'd write the data to one of several files (one for each table), then use loader to populate one table at a time. It's very hard to say whether any of these will be a win. The factors to consider include the amount of data, the number of tables, how they're indexed, whether you're doing this just once or on a regular basis, and so on. Do not rebuke them with harsh words ... but rather lead them gently - with URLs - so that they may learn wisdom.	[reply] [d/l]
Re: DBI: speed up insertions in generating database by eric256 (Parson) on Dec 04, 2004 at 15:59 UTC
In a similar situation I have done INSERT DELAYED + running things in batches of 100. Just remember to catch the last batch if your data isn't divisible by 100. The DELAYED part of insert will tell MySQL to do that insert when it has a moment. The upside is your program gets control back immediatly, the downside is you loose error reporting on your inserts. Another option might be to create an intermidiate table that holds which tables the data should go in, so you import to this table in large batches. Then go through that table moving them to the correct locations. I'm not sure that will get you much in the way of gain since you will then have to do 2 inserts, but it would allow you to do both operations in large batches. A final posibility that can be added to either of the above is to prepare a single SQL statment for each table at the beginning, then use then use those prepares to insert the data. `$prepares->{$table}->execute(@data);` ___________ Eric Hodges	[reply] [d/l]
Re: DBI: speed up insertions in generating database by zakzebrowski (Curate) on Dec 04, 2004 at 19:33 UTC
See High Performance MySQL book. It list many ways to improve performance. However, one trick I recently learned was to drop all indexes before you do the insert. (Though, then you need to re-generate the index after inserting so depending on the amount of new inserts you do that may be counter productive.) Also, do a google search on mysql for performance tricks. Alternatively, consider using SQLite, if it meets your needs. I was able to use that for a major project with no serrious issues. (I did not need full text search / ability to connect to the database from the network...) Cheers. ---- Zak - the office	[reply]
csv way (as request ;-) by Luca Benini (Scribe) on Dec 04, 2004 at 17:04 UTC
If you perl to parse data (of course if you prefer/need divide it) until you obtain a unique csv file and after (assuming file is mydata.csv) `LOAD DATA LOCAL INFILE 'mydata.csv' INTO TABLE mytable FIELDS TERMINATED BY ',' ENCLOSED BY '"' (list, of , your , table, field);` [download]	[reply] [d/l]
Re: DBI: speed up insertions in generating database by punch_card_don (Curate) on Dec 04, 2004 at 16:11 UTC
Thanks all for the suggestsions - common thread = do all theinserts for onew table at a time. So, I've re-written the parsing code to store each table's values in a separate text file temporarily. Then I'l run a separate importation routine that imports table by table. The new text-file-storage parser is running right now at about 2,400 origin file lines per hour (there hundreds of values to extract from each line after comparison with criteria in another file - very convoluted), whereas the direct db-storage program was running at ~300 origin-file-lines per hour.	[reply]
Re^2: DBI: speed up insertions in generating database by hakkr (Chaplain) on Dec 04, 2004 at 16:45 UTC
Yep defo write out a csv file and then use mysqlimport or a 'Load data infile' statement for each table. If you do stick with DBI Inserts maybe try the inserts with multiple VALUES lists `INSERT INTO $table (field1,field2) VALUES (value1,value2),(values3,val +ues4);` [download] From the mysql docs.. When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using a lot of INSERT statements. See section 13.1.5 LOAD DATA INFILE Syntax. mysqlmonks.org	[reply] [d/l]
Re: DBI: speed up insertions in generating database by erix (Prior) on Dec 04, 2004 at 22:25 UTC
Make sure to set autocommit off (MySQL's default is on, I believe), and commit not too often (every few hundred rows or so?).	[reply]
Re^2: DBI: speed up insertions in generating database by Anonymous Monk on Nov 11, 2015 at 17:13 UTC
begin-commit do the magic begin; <some sql code> commit;	[reply]
Re: DBI: speed up insertions in generating database by mkirank (Chaplain) on Dec 06, 2004 at 06:01 UTC
I had a similar situation Link , I also had to Insert into multiple Databases, Like some people have suggested 1. Bulk insert is fastest , write to a csv file and then do a load data infile to import the data 2. Use sqllite it is faster compared to mysql. 3 I am not sure but u can try this , You can Insert Into sqllite and then export to cvs and Import to Mysql (like i said im not sure about this but just a suggestion). 4. If it is speed you require , Write a C program.	[reply]
Re: DBI: speed up insertions in generating database by Thilosophy (Curate) on Dec 06, 2004 at 03:17 UTC
Do not use DBI and Perl to do this. Use MySQL's import utility to load the the file. That will be A LOT faster than anything you can do with normal SQL. If the file structure is too complex for the import tool to handle, use Perl first to create one or more CSV files that are simple enough.	[reply]
Re: DBI: speed up insertions in generating database by Jeppe (Monk) on Dec 05, 2004 at 18:26 UTC
Hah. This is simple - use a hash for storing prepared statements. Untested pseudocode: `while (my $line = FILE) { analyze $line to extract ($table, $value) pairs foreach ($table, $value) pair { my $sql_statement = "INSERT INTO $table VALUES(?)"; my $sth = store_or_retrieve($sql_statement); $sth->execute($value); } } my $sth_hashref; sub store_or_retrieve { my $sql_statement = shift; if (exists $sth_hashref->($sql_statement)) { return $sth_hashref->{sql_statement; } else { my $sth = $dbh->prepare($sql_statement); $sth_hashref->{$sql_statement} = $sth; return $sth; } }` [download] However, I would suggest you probably have more work to do regarding your database structure - do you really only have one row in each table?	[reply] [d/l]


P is for Practical
	PerlMonks