http://www.perlmonks.org?node_id=1012874


in reply to problem of my multithreading perl script

Reading large files is likely to be I/O bound not CPU bound, so it's not surprising that making it multi-threads didn't help.

You could try profiling your single threaded version to see where the time is being spent, and then try to improve that.

How do I profile my Perl programs?

But your code looks pretty simple, so it may not have much improvement to give.

  • Comment on Re: problem of my multithreading perl script

Replies are listed 'Best First'.
Re^2: problem of my multithreading perl script
by qingfengzealot (Initiate) on Jan 11, 2013 at 12:46 UTC
    Thanks RichardK for your kind reply. This is my first time to use multi-threads in perl. So I'm not sure if I properly use it or not. Even it is likely to to be I/O bound, the multi-threads one shouldn't run slower than the single thread one, am I right? Please correct me if I'm wrong. Thanks again for your help. Best regards, Qiongyi

      gingfengzealot:

      No, there are multiple ways that a threaded program can be slower than a single threaded program. In this case, I don't know exactly which one(s) you're hitting, but there are two candidates I can think of:

      • Disk thrashing: When you're reading a file sequentially, the disk drive can normally read sector after sector without (many) intermediate seeks. If you're reading multiple files at the same time, your drive needs to seek frequently between file locations. In other words, reading files one at a time generally has fewer disk seeks than reading them in parallel, and seeks are slow operations.
      • Data structure locking: In a single-threaded program, you don't need to worry about locking your data structures. But in a multithreaded program, you need to lock data structures when modifying them if another thread has the potential to also update the data structure. Even when there's no contention when accessing the data structure you're spending time creating and freeing locks.

      Having said that, you can probably speed things up without going to a single-threaded program.

      If your program is experiencing disk thrashing, you might be able to avoid it by placing your files on different drives. That way you won't have as many disk seeks.

      If your program is spending too much time managing data structure locks, you might be able to rearrange your code a bit to reduce the number of locks. For example, you might have each thread read and process a file in a local (nonshared) data structure, and then once it's finished a file, then merge the data into the shared structure.

      There are other possible problems and solutions to your problem, these are just the ones that came to mind. (I've run into both of them frequently enough...unfortunately.)

      ...roboticus

      When your only tool is a hammer, all problems look like your thumb.

      There is extra management overhead dealing with the threads, so yes, a multi-threaded program can run slower than a single threaded program, especially if there is no work other than the I/O to do.

      --MidLifeXis

      Actually it will run slower. All of the threads are writing to the same hash. For each write, the hash is locked and any other threads have to way until the current thread finishes its write before they can start.

      Try having each thread write to a separate hash, then merge them in the main thread once the reading is complete.