Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

DBI or mysqlimport?

by js1 (Monk)
on Jun 07, 2004 at 21:06 UTC ( #362109=perlquestion: print w/replies, xml ) Need Help??

js1 has asked for the wisdom of the Perl Monks concerning the following question:

Enlighten me if you will good monks.

When I come to parse 60GB of proxy logs with my perl script, I must forthwith those values parsed into a mysql database. What would be quicker/wiser?

1) Use DBI and insert the values as I parse each line

OR

2) Parse the values into a delimited file and import this using mysqlimport ?

OR

3) Other wisdom ?

Bless you all!
js1.

Replies are listed 'Best First'.
Re: DBI or mysqlimport?
by gmax (Abbot) on Jun 07, 2004 at 21:38 UTC
    Use DBI and insert the values as I parse each line

    Inserting records one by one is going to be the slowest approach.

    However, if you can use the extended insertion syntax that MySQL supports, this could be one of the best solutions.

    An extended insertion query looks like this:

    INSERT INTO tablename (col1, col2, col3) VALUES (value1, value2, value3), (value1, value2, value3), (value1, value2, value3), (value1, value2, value3);

    The bad news is that you need to figure out - with some benchmarks - how many records you should pass to each query, considering the limits set by max_allowed_packet (1 MB by default).

    When I had to do something similar, with a max_allowed_packet of 3 MB, I had very good results building queries up to 2 MB. As for the technique to build these queries, there is a detailed example in Speeding up the DBI, but the basic idea is like the following:

    my @values = ( ["A", "B", "D", "0"], ["E", "F", "G", "1"], ["H", "I", "J", "2"], ["K", "L", "M", "3"], ["N", "O", "P", "4"], ["Q", "R", "S", "5"], ["T", "U", "V", "6"], ["W", "X", "Y", "7"], ); my $query = "INSERT INTO tablename VALUES \n" . join(",\n", ( map { "(". join ( ", ", map {$dbh->quote($_)} @$_). ")" } @values));
    Parse the values into a delimited file and import this using mysqlimport ?

    This is also feasible. mysqlimport is one of the most tested tools in MySQL suite, and it is very much efficient, especially if dealing with tab-delimited files. Personally, once I found the right balance with the extended insert, I almost always manage to outperform this method using multi-record queries. YMMV.

    Other wisdom ?

    Yes. If your table has some indexes, in addition to primary key or unique keys, then you can use this set of instructions before and after the insertion:

    ALTER TABLE tablename DISABLE KEYS; # do the insertion ALTER TABLE tablename ENABLE KEYS;

    This trick will speed up the insertion considerably. Instead of building the indexes row by row, it will do it all at once at the end of the insertion.

     _  _ _  _  
    (_|| | |(_|><
     _|   
    
Re: DBI or mysqlimport?
by dragonchild (Archbishop) on Jun 08, 2004 at 01:05 UTC
    I am doing something similar. I pull from an Oracle database (without foreign keys) and am validating, adding keys, then loading into a MySQL database. This is for an online reporting app. The fastest solution I've found is something along the lines of:
    1. Dump each Oracle table into a tab-delim file. I do some validation here to limit the rows dumped. (This is actually the slowest part. Go figure.)
    2. Load each file into a MySQL table using LOAD FILE. I usually load about 25k rows/second.
    3. Create my foreign keys using UPDATE.
    4. Get rid of the bogus rows using DELETE.
    5. Issue OPTIMIZE TABLE to compact it (I'm using MyISAM).
    6. Add my indices.

    Before anyone says anything, I know MyISAM doesn't have FK's per se. I don't ever modify this database outside the above data load.

    I load about 10M rows in under 40 minutes this way, including all the FK relationships and indexing.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

Re: DBI or mysqlimport?
by runrig (Abbot) on Jun 07, 2004 at 22:36 UTC
    3) Other wisdom ?

    I pick 3 (i.e. try it and find out). Usually whatever native bulk loading facilities a database has will be faster than insert statements (whether it's perl/DBI or whatever), but I don't know about mysql, so the best option is to try it and find out. Or maybe the decision will be partly based on the error handling capabilities of the loading facility.

Re: DBI or mysqlimport?
by drewbie (Chaplain) on Jun 08, 2004 at 02:35 UTC
    If you can get the data into a format mysql can understand, you can simply do
    mysql < inputdata.sql
    which has always been blindingly fast for me. I don't remember off the top of my head if mysqlimport does any optimizations, so as a previous poster said Just Try It. :-)
Re: DBI or mysqlimport?
by thor (Priest) on Jun 07, 2004 at 21:31 UTC
    I don't know anything about mysqlimport, but if I had to guess, I'd say that the DBI is faster...unless you can get mysqlimport to read from standard in. The reason I say that is that any amount of time that you save on the actual insert will more than likely be lost on disk I/O (you have to write your logs to the disk, and then have mysqlimport read it when it's time).

    thor

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://362109]
Approved by gmax
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2019-10-17 04:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?