Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: When to use forks, when to use threads ...?

by tilly (Archbishop)
on Sep 04, 2008 at 17:50 UTC ( [id://709077]=note: print w/replies, xml ) Need Help??


in reply to When to use forks, when to use threads ...?

My personal opinion is that in Perl I would need an extraordinary motivation to use multi-threading. Forking is almost always a better way to go on platforms that natively support it.

For a mass insertion into a database I would recommend neither approach. Instead I would suggest learning what native tools your database has for mass inserts. I would then use them. If the table you're inserting into has indexes I would strongly suggest dropping all indexes, doing the insert, then re-creating the indexes. (The reason for that is that maintaining indexes during inserts results in a lot of random seeks to disk. Seeking to disk is expensive. Throwing away the index and rebuilding it at the end avoids most of those seeks and is therefore much faster.)

Replies are listed 'Best First'.
Re^2: When to use forks, when to use threads ...?
by Krambambuli (Curate) on Sep 05, 2008 at 07:42 UTC
    For a mass insertion into a database I would recommend neither approach. Instead I would suggest learning what native tools your database has for mass inserts. I would then use them.
    ...If the table you're inserting into has indexes I would strongly suggest dropping all indexes,...

    That doesn't work as needed, unfortunately. It's not a simple insertion, but rather an insert-or-update-if-already-there process. So I cannot disable indexes either. Parallelizing the process in some way seemed the most appealing alternative to speed up things.

    Krambambuli
    ---
      If you're using MySQL, the fastest approach is to load the data in a temporary table and then use an INSERT... ON DUPLICATE UPDATE statement to copy it all into the existing table in one shot.
      Those are called upserts and the SQL 03 standard for that (may not be implemented in your database) is called merge. If that is not implemented and no other variants exist you can do an update followed by an insert of everything that is not found.

      In any case variants of Perrin's solution is the standard way to do it. Load a temporary table, then do the update within the database. If there is The primary key index should not be dropped, but all other indexes can be, and then can be re-created. With a good database that should be the most efficient way to go.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://709077]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (4)
As of 2024-03-19 04:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found