Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: How to optimise " csv into mysql " using TEXT:CSV_XS and DBD::Mysql

by Corion (Patriarch)
on Aug 06, 2015 at 08:51 UTC ( [id://1137625]=note: print w/replies, xml ) Need Help??


in reply to How to optimise " csv into mysql " using Text:CSV_XS and DBD::Mysql

I would try to use the bulk load facility of MySQL instead of sending single INSERT statements. Usually, bulk load is much much faster than sending single INSERT statements. Maybe the INSERT DELAYED is already fast enough, alternatively, the LOAD DATA INFILE syntax is what I'd aim for.

All of this advice comes from https://dev.mysql.com/doc/refman/5.5/en/optimizing-myisam-bulk-data-loading.html and https://dev.mysql.com/doc/refman/5.6/en/optimizing-innodb-bulk-data-loading.html.

  • Comment on Re: How to optimise " csv into mysql " using TEXT:CSV_XS and DBD::Mysql

Replies are listed 'Best First'.
Re^2: How to optimise " csv into mysql " using TEXT:CSV_XS and DBD::Mysql
by taiko (Sexton) on Aug 06, 2015 at 09:15 UTC
    thank you for the hint, however, I tried to use LOAD DATA prior to this solution, but as input data is very dirty, I wanted to have more control on whats being imported. But maybe I can try to remove the garbage and then use LOAD DATA.

      This is the approach I use with a different database which basically can only do bulk loads, because single INSERTs are very slow. The approach is:

      1. Read input data using Text::CSV_XS
      2. Perform cleanup on each row
      3. Add relevant columns (source_file, source_date)
      4. Write output TSV file in the correct encoding (Text::CSV_XS, this time writing tab-separated output)
      5. Issue bulk load statement
      6. Check that the number of rows read in 1, written in 4 and retrieved from the database are as expected

        As I don't need to add/delete columns, I guess I just can make a cleanup using INFILE edit as I'm doing now and then LOAD INTO as you proposed. There is no real added value of using TEXT:CSV_XS, or am I missing something? Actually, if I look to my code, I'm using TEXT::CSV_XS for no reason :). Maybe just wanted to be fancy :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1137625]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-20 03:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found