Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: No Performance gain with Parallel::ForkManager

by Laurent_R (Canon)
on Feb 23, 2014 at 22:00 UTC ( [id://1075910]=note: print w/replies, xml ) Need Help??


in reply to No Performance gain with Parallel::ForkManager

Although there might be some explanations for it, I am a bit surprised that at least a few processes in parallel does not bring you some performance improvement. I am doing very often some intensive data extraction from a very large (split) database, this is mostly IOs, and I am usually getting the best results with a maximum number of processes anywhere between once to twice the number of CPUs or CPU cores. The results might be very different with a different setting. On the other hand, I came across some cases (wrongly written programs) where one process was locking the data access for the others, preventing any improvement from parallel processing (well, actually leading to poorer performance). I am wondering if you are not meeting one of these cases.
  • Comment on Re: No Performance gain with Parallel::ForkManager

Replies are listed 'Best First'.
Re^2: No Performance gain with Parallel::ForkManager
by davido (Cardinal) on Feb 24, 2014 at 01:08 UTC

    The thing is that the files being read are big enough that two different files very likely sit on different physical tracks on the hard drive. If you read one file, then the next file, then the next one, sequentially, the amount of drive head movement is minimized. If you read two, three, four, or ten files in parallel, the drive head has to shift back and forth a lot. Also, the buffering is less effective, since the drive reading ahead and filling the buffer is probably going to not fill the buffer with data that will be useful to the next request coming in from a different forked child. So the forks are actually working against each other, losing all buffering benefit, and even causing the drive to have to seek to and fro repeatedly.

    When multiple processes are using the same physical resource to grab information that is distributed all over the place on that resource, in what amounts to be unpredictable order, it's no surprise that they degrade performance.


    Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1075910]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (9)
As of 2024-04-18 16:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found