Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: problem of my multithreading perl script

by qingfengzealot (Initiate)
on Jan 11, 2013 at 12:46 UTC ( #1012876=note: print w/ replies, xml ) Need Help??


in reply to Re: problem of my multithreading perl script
in thread problem of my multithreading perl script

Thanks RichardK for your kind reply. This is my first time to use multi-threads in perl. So I'm not sure if I properly use it or not. Even it is likely to to be I/O bound, the multi-threads one shouldn't run slower than the single thread one, am I right? Please correct me if I'm wrong. Thanks again for your help. Best regards, Qiongyi


Comment on Re^2: problem of my multithreading perl script
Re^3: problem of my multithreading perl script
by MidLifeXis (Prior) on Jan 11, 2013 at 13:32 UTC

    There is extra management overhead dealing with the threads, so yes, a multi-threaded program can run slower than a single threaded program, especially if there is no work other than the I/O to do.

    --MidLifeXis

Re^3: problem of my multithreading perl script
by Anonymous Monk on Jan 11, 2013 at 13:33 UTC

    Actually it will run slower. All of the threads are writing to the same hash. For each write, the hash is locked and any other threads have to way until the current thread finishes its write before they can start.

    Try having each thread write to a separate hash, then merge them in the main thread once the reading is complete.

Re^3: problem of my multithreading perl script
by roboticus (Canon) on Jan 11, 2013 at 13:48 UTC

    gingfengzealot:

    No, there are multiple ways that a threaded program can be slower than a single threaded program. In this case, I don't know exactly which one(s) you're hitting, but there are two candidates I can think of:

    • Disk thrashing: When you're reading a file sequentially, the disk drive can normally read sector after sector without (many) intermediate seeks. If you're reading multiple files at the same time, your drive needs to seek frequently between file locations. In other words, reading files one at a time generally has fewer disk seeks than reading them in parallel, and seeks are slow operations.
    • Data structure locking: In a single-threaded program, you don't need to worry about locking your data structures. But in a multithreaded program, you need to lock data structures when modifying them if another thread has the potential to also update the data structure. Even when there's no contention when accessing the data structure you're spending time creating and freeing locks.

    Having said that, you can probably speed things up without going to a single-threaded program.

    If your program is experiencing disk thrashing, you might be able to avoid it by placing your files on different drives. That way you won't have as many disk seeks.

    If your program is spending too much time managing data structure locks, you might be able to rearrange your code a bit to reduce the number of locks. For example, you might have each thread read and process a file in a local (nonshared) data structure, and then once it's finished a file, then merge the data into the shared structure.

    There are other possible problems and solutions to your problem, these are just the ones that came to mind. (I've run into both of them frequently enough...unfortunately.)

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1012876]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (7)
As of 2014-09-15 10:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (146 votes), past polls