Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re: Parallelization of multiple nested loops

by salva (Abbot)
on Feb 07, 2018 at 10:51 UTC ( #1208615=note: print w/replies, xml ) Need Help??

in reply to Parallelization of multiple nested loops

You are probably forking too much.

Instead of forking in the most inner loop, do it at the second or third level, so that instead of launching 5**10 6**11 little processes, you launch just a few hundreds.

Update: Oh, sorry, I didn't read your post fully. It seems you have already tried to do that. If your problem is getting the results back, the simplest solution in my experience is to have every process write its part of the computation into a file and then have a last stage where all the partial outputs are merged. In most cases this is also good enough in terms of computational cost.

The alternative (as you are really doing under the hood in your code by using the on finish hooks) is to have the slave process serialize the partial results and pipe then to the master process which then merges all of then. You avoid the cost of writing and reading the intermediate data to the file system, but on the other hand everything has to be in RAM.

  • Comment on Re: Parallelization of multiple nested loops

Replies are listed 'Best First'.
Re^2: Parallelization of multiple nested loops
by Eily (Prior) on Feb 07, 2018 at 11:04 UTC

    This contradicts this:

    The problem is that with the forking inside the last loop it only starts max 6 processes
    Which I did not notice at first but reading your post made me realize there might be something wrong here. Maybe the inner loop is so fast that the first child finishes before the seventh even starts? Which would make that loop a pretty bad candidate for parallelisation. Then again, this might be true with this test code, but not the actual program.

      There may be at most 6 processes running in parallel. But my numbers refer to the total number of processes created and destroyed during the complete computation.

      My point is that starting and stopping processes have a non insignificant overhead. If you start too many the total cost of forking may be an important percentage of the total cost of the computation.

        Quite. If my reading of biosub's code (and my maths) is correct it forks 6^12 times (2,176,782,336 times). That's a tonne of overhead.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1208615]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2018-06-23 16:28 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.