http://www.perlmonks.org?node_id=1012918


in reply to problem of my multithreading perl script

However, it seems that the multithread one ran even much slower than the single thread script.

The problem here is nothing the threading per se -- although it may be compounded by it. A big part of the slowdown is due to asking your harddisk to read from multiple files simultaneously, forcing the read head to dance all over the disk to fetch one block from one file; then one block from another; then one block from another ...; and then back to the first; and so on.

Regardless of the speed of your disk -- the same would be true for SSDs, though less so -- and regardless of whether you use threads or separate processes, reading 12 large files concurrently will be far slower than reading those same file sequentially.

A good analogy would be you trying to read 12 books by reading one page of one then one of the next and so on. Even ignoring the affect that will have on your brain trying to keep all the stories straight; just the simple need to constantly switch from one book to another to another will seriously slow down your throughput rate.

This can be somewhat mitigated by putting the files on different physical drives -- either multiple physical configured as multiple logical drives; or by multiple physical drives raided as a single logical drive -- because one can be actually transferring data whilst other(s) are moving their read heads. But even this will usually result in lower throughput than sequential reading because of the extra context switches (thread or process); system bus and device bus conflicts etc. It also creates extra load on the physical/virtual memory mapping; and the L1/L2/L3 memory and system file caches.

In the past, I have had some success speeding up the reading of many files by serially slurping them into scalars and then handing those scalars off to threads to process line by line -- I do that by opening the slurped scalar as a ram file, and then using the familiar while( <$FH> ){ ... } loop on the scalar -- but even this requires considerable care to ensure that the (huge) slurped scalars don't get unnecessarily duplicated in the process of handing them off to the threads for processing.

Swapping doesn't help

I also note that you are building a shared hash containing (from what you said) 12 shared sub-hashes, each containing 21,000,000 key/value pairs.

On the basis of a simple experiment -- a shared hash containing 5,000,000 key value pairs shared by just 2 threads requires 2.0GB on my 64-bit perl -- I therefore estimate that your shared hash will require at least 12 * 2.0/5*21 = 100.8GB, and possible much more if you have long keys or values. So, unless you have a very large amount of memory you will be moving your system into swapping by loading that much data. Indeed, trying to load that amount into a non-shared hash is going to move you into swapping even if you read them sequentially in a single-threaded process unless you have circa. 64GB ram in your system.

Basically, I think you need to re-think the way you are tackling this problem. Is it really necessary to have all that data in memory concurrently?

What is the data? What calculations are you performing?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: problem of my multithreading perl script
by sundialsvc4 (Abbot) on Jan 11, 2013 at 17:39 UTC
    -- the same would be true for SSDs, though less so --
    As an aside, I am quite intrigued by this comment.   At your convenience, I would like to know more of your insights with regard to this assertion, perhaps best moved to a separate thread.   I would intuit that such latencies could not exist with solid-state, although I do not dispute you; hence, “howcum?”

      Take a look at an SSD review.

      Note how the random 4k reads achieves a relatively low throughput (101.4MB/s in that particular case), whereas the throughput for sequential reads is considerably higher (431.8MB/s).

      This is a common factor with all SSDs. Much faster than spinning rust, but it is still considerably faster to read a file sequentially than randomly.

      If you are reading 12 files sequentially, concurrently, you are reading 4K blocks randomly from the viewpoint of the controller.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.