Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

DBI or mysqlimport?

by js1 (Monk)
on Jun 07, 2004 at 21:06 UTC ( #362109=perlquestion: print w/replies, xml ) Need Help??

js1 has asked for the wisdom of the Perl Monks concerning the following question:

Enlighten me if you will good monks.

When I come to parse 60GB of proxy logs with my perl script, I must forthwith those values parsed into a mysql database. What would be quicker/wiser?

1) Use DBI and insert the values as I parse each line


2) Parse the values into a delimited file and import this using mysqlimport ?


3) Other wisdom ?

Bless you all!

Replies are listed 'Best First'.
Re: DBI or mysqlimport?
by gmax (Abbot) on Jun 07, 2004 at 21:38 UTC
    Use DBI and insert the values as I parse each line

    Inserting records one by one is going to be the slowest approach.

    However, if you can use the extended insertion syntax that MySQL supports, this could be one of the best solutions.

    An extended insertion query looks like this:

    INSERT INTO tablename (col1, col2, col3) VALUES (value1, value2, value3), (value1, value2, value3), (value1, value2, value3), (value1, value2, value3);

    The bad news is that you need to figure out - with some benchmarks - how many records you should pass to each query, considering the limits set by max_allowed_packet (1 MB by default).

    When I had to do something similar, with a max_allowed_packet of 3 MB, I had very good results building queries up to 2 MB. As for the technique to build these queries, there is a detailed example in Speeding up the DBI, but the basic idea is like the following:

    my @values = ( ["A", "B", "D", "0"], ["E", "F", "G", "1"], ["H", "I", "J", "2"], ["K", "L", "M", "3"], ["N", "O", "P", "4"], ["Q", "R", "S", "5"], ["T", "U", "V", "6"], ["W", "X", "Y", "7"], ); my $query = "INSERT INTO tablename VALUES \n" . join(",\n", ( map { "(". join ( ", ", map {$dbh->quote($_)} @$_). ")" } @values));
    Parse the values into a delimited file and import this using mysqlimport ?

    This is also feasible. mysqlimport is one of the most tested tools in MySQL suite, and it is very much efficient, especially if dealing with tab-delimited files. Personally, once I found the right balance with the extended insert, I almost always manage to outperform this method using multi-record queries. YMMV.

    Other wisdom ?

    Yes. If your table has some indexes, in addition to primary key or unique keys, then you can use this set of instructions before and after the insertion:

    ALTER TABLE tablename DISABLE KEYS; # do the insertion ALTER TABLE tablename ENABLE KEYS;

    This trick will speed up the insertion considerably. Instead of building the indexes row by row, it will do it all at once at the end of the insertion.

     _  _ _  _  
    (_|| | |(_|><
Re: DBI or mysqlimport?
by dragonchild (Archbishop) on Jun 08, 2004 at 01:05 UTC
    I am doing something similar. I pull from an Oracle database (without foreign keys) and am validating, adding keys, then loading into a MySQL database. This is for an online reporting app. The fastest solution I've found is something along the lines of:
    1. Dump each Oracle table into a tab-delim file. I do some validation here to limit the rows dumped. (This is actually the slowest part. Go figure.)
    2. Load each file into a MySQL table using LOAD FILE. I usually load about 25k rows/second.
    3. Create my foreign keys using UPDATE.
    4. Get rid of the bogus rows using DELETE.
    5. Issue OPTIMIZE TABLE to compact it (I'm using MyISAM).
    6. Add my indices.

    Before anyone says anything, I know MyISAM doesn't have FK's per se. I don't ever modify this database outside the above data load.

    I load about 10M rows in under 40 minutes this way, including all the FK relationships and indexing.

    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

Re: DBI or mysqlimport?
by runrig (Abbot) on Jun 07, 2004 at 22:36 UTC
    3) Other wisdom ?

    I pick 3 (i.e. try it and find out). Usually whatever native bulk loading facilities a database has will be faster than insert statements (whether it's perl/DBI or whatever), but I don't know about mysql, so the best option is to try it and find out. Or maybe the decision will be partly based on the error handling capabilities of the loading facility.

Re: DBI or mysqlimport?
by drewbie (Chaplain) on Jun 08, 2004 at 02:35 UTC
    If you can get the data into a format mysql can understand, you can simply do
    mysql < inputdata.sql
    which has always been blindingly fast for me. I don't remember off the top of my head if mysqlimport does any optimizations, so as a previous poster said Just Try It. :-)
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://362109]
Approved by gmax
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2021-03-07 16:26 GMT
Find Nodes?
    Voting Booth?
    My favorite kind of desktop background is:

    Results (122 votes). Check out past polls.