Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Batch Upload/Insert – Row wise with Perl DBI

by shree (Acolyte)
on Sep 27, 2011 at 15:47 UTC ( #928107=perlquestion: print w/ replies, xml ) Need Help??
shree has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, Can any one help me out by providing some examples for Batch upload/Insert row-wise

For example

$sth = $dbh->prepare(“insert into table values (?, ?, ?)”); @array1 = ( 1, 2, 3 ) # array that contains the values of column1 acro +ss all the rows. @array2 = ( 1, 2, 3 ) # array that contains the values of column2 acro +ss all the rows. @array3 = ( 1, 2, 3 ) # array that contains the values of column3 acro +ss all the rows. $sth->execute_array({}, \@array1, \@array2, \@array3);

This code will do the batch load column-wise it will insert all records in the array to the DB.

Any idea on how can I do the batch load row-wise insted of column-wise...?

Comment on Batch Upload/Insert – Row wise with Perl DBI
Download Code
Re: Batch Upload/Insert – Row wise with Perl DBI
by morgon (Deacon) on Sep 27, 2011 at 16:18 UTC
    $sth = $dbh->prepare(“insert into table values (?, ?, ?)”); for my $i (0..2) { # adjust the upper limit accordingly $sth->execute($array1[$i], $array2[$i], $array3[$i]); }
Re: Batch Upload/Insert – Row wise with Perl DBI
by runrig (Abbot) on Sep 27, 2011 at 16:33 UTC
    According to the docs, you can use the ArrayTupleFetch attribute in the execute_array() method, but I don't see how it's any better than what morgon provided above (unless your db really does some sort of batch/bulk load in this situation).

      FYI, some DBDs handle execute_array etc themselves often allowing many rows of data to be sent to the database in one go. DBD::Oracle is one. If you look at the tests in DBD::Oracle and 26exe_array.t you can find examples.

Re: Batch Upload/Insert – Row wise with Perl DBI
by Marshall (Prior) on Sep 28, 2011 at 14:26 UTC
    If you are inserting a lot of rows into the DB, combining all of the inserts into a single transaction will typically increase the performance dramatically. Batch inserts into the DB typically require a proprietary format of the data. Reducing the number of transactions usually speeds things up enough that further optimization is not necessary (YMMV).

    The big performance hit is not in the execute() method itself, but in all of the "housekeeping" that the DB does to finalize the transaction. When you connected, the default would have been to set autocommit=1, that causes each execute() to be an independent transaction. You can temporarily override the autocommit, by explicitly starting a transaction as show below. Assuming that you have set RaiseError=1,:

    my @data = ([11,12,13], [21,22,23], [31,32,33]); $sth = $dbh->prepare("insert into table values (?, ?, ?)"); # start new transaction # $dbh->begin_work(); #or perhaps $dbh->do("BEGIN"); foreach my $row (@data) { $sth->execute(@$row); } # end the transaction # $dbh->commit(); #or perhaps $dbh->do("COMMIT");

      Very nice. This resulted in a 20x speed up of some code I was working with, loading daily batches into a table with a large amount of history. Blocks of 10000 lines were taking 150s give or take and with this change now complete in 7s.

      I was looking for details on Oracle syntax for batch inserting so that I could really make my code look very ugly in the name of performance. This will do nicely instead.

        Thanks for sharing your success with us!

        I have seen similar or even greater performance increases, see: Re^3: Efficient way to handle huge number of records? for a more recent benchmark. In this benchmark, the code started on Friday afternoon and runs until early on Sunday morning. Improved code runs in 3 minutes! And that's just for one 1 million row table! This matters and I am very, very sure about it! I mean couple of minutes versus couple of days!

        Current version of this code imports 5,740,776 rows into 8 tables and runs in less than 10 minutes on my slow machine. The production machine will run this much faster. In my application, this is "fast enough".

        This is a case where there is a LOT of low hanging fruit and its easy to get it if you know how! 20x performance increase is nothing to be sneezed at!

        The next level of performance is how the DB deals with indicies. It is faster to create the indicies after the complete table is built. Getting that next level is harder than the first, most important level.

        I congratulate you for your success and also reporting back after some months with the results. This helps others and that is what this forum is about.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://928107]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (11)
As of 2014-07-23 08:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (136 votes), past polls