punch_card_don has asked for the wisdom of the Perl Monks concerning the following question:
Malkovitchian Monks,
I am populating a mysql databse programmatically, analyzing a giant text file and then inserting into one of the many tables.
pseudo code:
open FILE
while ($line = FILE) {
analyze $line to extract ($table, $value) pairs
foreach ($table_$value_pair) {
$sql = "INSERT INTO ".$table." VALUES(".$value.")"
prepare
execute()
}
}
This works, but is very slow. I can't get around the repeated prepares because the table changes with each iteration and there is way too much data (~1Gb) to hold it all in table-keyed hashes ready for a table-wise insertion at the end.
One option I'm considering is doing it in blocks, storing for a while, then purging:
open FILE
$index = 0;
while ($line = FILE) {
analyze $line to extract ($table, $value) pairs
push each found $value onto $values_hash{$table} as 2-d array so
+$values_hash{$table} = a list of $values
$index++;
when $index reaches, say, 10,000 or 100,000 {
foreach $key (keys $values_hash) {
$sql = "INSERT INTO ".$table." VALUES(?)"
prepare
foreach $value in list $values_hash{$key}
execute($value)
}
}
clear $values_hash ready to start storing again
}
}
I figure this should help, but not sure how much I will actually gain by that.
Are the are any other slick speed tricks? I vaguley remember something to do with pre-generating csv files and then importing....
Re: DBI: speed up insertions in generating database
by gmax (Abbot) on Dec 04, 2004 at 14:34 UTC
|
You may use the INSERT extended syntax for inserting several rows at once. In addition, if your table has several indexes (besides the primary key) you may use the (MySQL specific) command "DISABLE/ENABLE KEYS".
Here is a framework for your insertion.
$dbh->do(qq{ALTER TABLE $table DISABLE KEYS});
my $base_query = qq{INSERT INTO $table VALUES };
my $query = $base_query;
my $count = 0;
my $max_rows = 10000; # adjust it to your needs
while ($line = <FILE>)
{
my @values = something_from($line);
$query .= "," if $count++;
$query .= "("
. join(",", map { $dbh->quote($_) } @values )
. ")";
if ($count > $max_rows)
{
$dbh->do($query)
or die "something wrong ($DBI::errstr)";
$query = $base_query;
$count = 0;
}
}
$dbh->do($query) if $count;
$dbh->do(qq{ALTER TABLE $table ENABLE KEYS});
See the manual for INSERT and for ALTER TABLE, and Speeding up the DBI for some more on related subjects.
| [reply] [d/l] |
Re: DBI: speed up insertions in generating database
by VSarkiss (Monsignor) on Dec 04, 2004 at 15:04 UTC
|
my %prepares = (
table1 => $dbh->prepare("insert table1 (...) values (?, ...)",
table2 => $dbh->prepare("insert table2 (...) values (?, ...)",
# and so on
);
# Now process the file
open FILE, '<', $wherever;
while (<FILE>)
{
# set $table and @values from $_
$prepares{$table}->execute(@values);
}
This may or may not help, depending on lots and lots of factors.
As for using CSV files, you're probably thinking of using mysql loader. That may be a win, again depending on lots of factors. Essentially, instead of executing the insert statement in your program, you'd write the data to one of several files (one for each table), then use loader to populate one table at a time.
It's very hard to say whether any of these will be a win. The factors to consider include the amount of data, the number of tables, how they're indexed, whether you're doing this just once or on a regular basis, and so on.
| [reply] [d/l] |
Re: DBI: speed up insertions in generating database
by eric256 (Parson) on Dec 04, 2004 at 15:59 UTC
|
In a similar situation I have done INSERT DELAYED + running things in batches of 100. Just remember to catch the last batch if your data isn't divisible by 100. The DELAYED part of insert will tell MySQL to do that insert when it has a moment. The upside is your program gets control back immediatly, the downside is you loose error reporting on your inserts.
Another option might be to create an intermidiate table that holds which tables the data should go in, so you import to this table in large batches. Then go through that table moving them to the correct locations. I'm not sure that will get you much in the way of gain since you will then have to do 2 inserts, but it would allow you to do both operations in large batches.
A final posibility that can be added to either of the above is to prepare a single SQL statment for each table at the beginning, then use then use those prepares to insert the data.
$prepares->{$table}->execute(@data);
| [reply] [d/l] |
Re: DBI: speed up insertions in generating database
by zakzebrowski (Curate) on Dec 04, 2004 at 19:33 UTC
|
See High Performance MySQL book. It list many ways to improve performance. However, one trick I recently learned was to drop all indexes before you do the insert. (Though, then you need to re-generate the index after inserting so depending on the amount of new inserts you do that may be counter productive.) Also, do a google search on mysql for performance tricks. Alternatively, consider using SQLite, if it meets your needs. I was able to use that for a major project with no serrious issues. (I did not need full text search / ability to connect to the database from the network...) Cheers.
| [reply] |
csv way (as request ;-)
by Luca Benini (Scribe) on Dec 04, 2004 at 17:04 UTC
|
If you perl to parse data (of course if you prefer/need divide it) until you obtain a unique csv file and after (assuming file is mydata.csv)
LOAD DATA LOCAL INFILE 'mydata.csv'
INTO TABLE mytable
FIELDS
TERMINATED BY ','
ENCLOSED BY '"'
(list, of , your , table, field);
| [reply] [d/l] |
Re: DBI: speed up insertions in generating database
by punch_card_don (Curate) on Dec 04, 2004 at 16:11 UTC
|
Thanks all for the suggestsions - common thread = do all theinserts for onew table at a time.
So, I've re-written the parsing code to store each table's values in a separate text file temporarily. Then I'l run a separate importation routine that imports table by table.
The new text-file-storage parser is running right now at about 2,400 origin file lines per hour (there hundreds of values to extract from each line after comparison with criteria in another file - very convoluted), whereas the direct db-storage program was running at ~300 origin-file-lines per hour. | [reply] |
|
INSERT INTO $table (field1,field2) VALUES (value1,value2),(values3,val
+ues4);
From the mysql docs..
When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using a lot of INSERT statements. See section 13.1.5 LOAD DATA INFILE Syntax.
mysqlmonks.org
| [reply] [d/l] |
Re: DBI: speed up insertions in generating database
by erix (Prior) on Dec 04, 2004 at 22:25 UTC
|
Make sure to set autocommit off (MySQL's default is on, I believe), and commit not too often (every few hundred rows or so?).
| [reply] |
|
begin-commit do the magic
begin;
<some sql code>
commit;
| [reply] |
Re: DBI: speed up insertions in generating database
by mkirank (Chaplain) on Dec 06, 2004 at 06:01 UTC
|
I had a similar situation
Link , I also had to Insert into multiple Databases, Like some people have suggested
1. Bulk insert is fastest , write to a csv file and then do a load data infile to import the data
2. Use sqllite it is faster compared to mysql.
3 I am not sure but u can try this , You can Insert Into sqllite and then export to cvs and Import to Mysql (like i said im not sure about this but just a suggestion).
4. If it is speed you require , Write a C program.
| [reply] |
Re: DBI: speed up insertions in generating database
by Thilosophy (Curate) on Dec 06, 2004 at 03:17 UTC
|
Do not use DBI and Perl to do this.
Use MySQL's import utility to load the the file. That will be A LOT faster than anything you can do with normal SQL.
If the file structure is too complex for the import tool to handle, use Perl first to create one or more CSV files that are simple enough. | [reply] |
Re: DBI: speed up insertions in generating database
by Jeppe (Monk) on Dec 05, 2004 at 18:26 UTC
|
Hah. This is simple - use a hash for storing prepared statements. Untested pseudocode:
while (my $line = FILE) {
analyze $line to extract ($table, $value) pairs
foreach ($table, $value) pair {
my $sql_statement = "INSERT INTO $table VALUES(?)";
my $sth = store_or_retrieve($sql_statement);
$sth->execute($value);
}
}
my $sth_hashref;
sub store_or_retrieve {
my $sql_statement = shift;
if (exists $sth_hashref->($sql_statement)) {
return $sth_hashref->{sql_statement;
} else {
my $sth = $dbh->prepare($sql_statement);
$sth_hashref->{$sql_statement} = $sth;
return $sth;
}
}
However, I would suggest you probably have more work to do regarding your database structure - do you really only have one row in each table? | [reply] [d/l] |
|
|