Beefy Boxes and Bandwidth Generously Provided by pair Networks httptech
We don't bite newbies here... much
 
PerlMonks  

Re: DBI: How to update 150K records efficiently

by jhourcle (Prior)
on Mar 31, 2008 at 15:22 UTC ( #677564=note: print w/ replies, xml ) Need Help??


in reply to DBI: How to update 150K records efficiently

You might want to look into the documentation for your database, as there's likely a bulk-loading ability to handle this sort of replacement.

In Oracle, it's the command 'sqlldr'. For mysql, use the 'LOAD DATA LOCAL INFILE' command with the 'REPLACE' keyword. (note -- removing the LOCAL keyword will likely result in a failure on permissions).

There are also tricks that you can use when handling this size of table -- I copy the table, drop all indexes, load the data, then recreate the indexes and replace the original table. This saves the indexes needing to update for each insertion. With oracle, you can defer constraints, so they're done in mass. (but, if you screw up, you have to roll back the entire transaction).

For comparison ... I updated a mysql table this morning (6.2 million records), using 'LOAD DATA' in under 7.5 min:

mysql> LOAD DATA LOCAL INFILE '/tmp/HINODE_sizes.txt' REPLACE INTO TAB +LE hinode.files ( filepath, instrument, quicklook, mod_time, filesize + ); Query OK, 11884251 rows affected, 1 warning (7 min 26.99 sec) Records: 6264061 Deleted: 5620190 Skipped: 0 Warnings: 1

In your example, if your primary key is 'vc', you should be able to load it as a pipe delim file


Comment on Re: DBI: How to update 150K records efficiently
Download Code
Re^2: DBI: How to update 150K records efficiently
by graff (Chancellor) on Apr 01, 2008 at 04:54 UTC
    If I'm not mistaken, the use of LOAD DATA INFILE (or oracle "sqlldr") is only suitable for doing insertions to a table -- it doesn't do updates. (But maybe I'm mistaken?)

    In any case, since the OP is clearly talking about UPDATE, I think the "REPLACE" keyword would definitely not be good, unless the input to the process was in fact a complete set of replacement records for the table. And even then it might still be a really bad idea, if the table involves autoincrement primary keys that are involved in foreign key relations elsewhere. (Because reloading the whole table is apt to assign a completely different set of autoincrement keys.) Again, if I am wrong, I'd love to know...

      Although the primary use is insertion, they can be used for updates ... mysql directly, oracle in a more round-about way.

      The 'REPLACE' keyword in mysql's 'LOAD DATA' comment is record-by-record, based on the primary key of the table. If you're using autoincrement for for primary key, this isn't going to work for you. In the example given, however, there's a WHERE clause using the field 'vc', which I assumed was the primary key, or at least a unique index.

      You're right that sqlldr doesn't handle this case. (its REPLACE keyword will remove the entire table, as will TRUNCATE. The only other option for tables w/ existing data is 'APPEND'.) However, you can use sqlldr to get the data into a temp table, and then use a pl/sql command to replace the records as needed (or insert into the temp tables those that don't have a corresponding record already). You'd have to benchmark it to see what's the best method for your specific situation.

      If you're going to be designing a table that requires regular bulk updates, I'd highly recommend finding a suitable key (even if it's a composite key) and not use a sequence or autoincrement

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://677564]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (13)
As of 2014-04-17 10:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (444 votes), past polls