http://www.perlmonks.org?node_id=1208610

biosub has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone,

I'm a bit of a newbie in Perl, I'm a biologist who codes for bioinformatics and I don't come across Perl that often. But here I am..

I'm trying to implement parallelization in someone else code that searches through a very big parameter space (a.k.a. all the possible combinations of the possible values of the defined parameters).
The whole scripts (here's just a minimal example, computation after the loops is excluded) takes around 4 days to run, I need to run it with several different settings... here it goes the need for parallization.

I managed to use Parallel:ForkManager for the rest of the code but I'm stuck with these nested loops. Here's the code:

use strict; use warnings; use Parallel::ForkManager; my $fork_manager = Parallel::ForkManager -> new ( 32 ); my @p = split ("1,0.6,0.4,0.1,0.6,0,0.4,0.4,1,0.5,1,1,1"); my @param_arr = (); $fork_manager -> run_on_finish(sub { my ($pid, $exit_code, $ident, $exit_signal, $core_dump, $data_stru +cture_reference) = @_; # retrieve data struct +ure from child if (defined($data_structure_reference)) { # if children pass a s +tring (they all should here, and it should be the $temp_out) my $out = ${$data_structure_reference}; # append string passed + by child to array push @param_arr, $out; # append string passed + by child to array } else { # issue warning if not + having output as expected print "No message received from child process $pid!\n"; };} ); # my $fm_outer_1 = new Parallel::ForkManager -> new ( 6 ) # loop through desired combinations for ($p[0]=0; $p[0]<=1; $p[0]+=0.2){ for ($p[1]=0; $p[1]<=1; $p[1]+=0.2){ for ($p[2]=0; $p[2]<=1; $p[2]+=0.2){ for ($p[3]=0; $p[3]<=1; $p[3]+=0.2){ for ($p[4]=0; $p[4]<=1; $p[4]+=0.2){ for ($p[5]=0; $p[5]<=1; $p[5]+=0.2){ for ($p[6]=0; $p[6]<=1; $p[6]+=0.2){ for ($p[7]=0; $p[7]<=1; $p[7]+=0.2){ for ($p[8]=0; $p[8]<=1; $p[8]+=0.2){ for ($p[9]=0; $p[9]<=1; $p[9]+=0.2 +){ # $fm_outer_1 -> start and nex +t; for ($p[10]=0; $p[10]<=1; $p[1 +0]+=0.2){ $fork_manager -> start and + next; $p[11]=1; $p[12]=1; #------------- $temp_out = "$p[0]\t$p[1]\ +t$p[2]\t$p[3]\t$p[4]\t$p[5]\t$p[6]\t$p[7]\t$p[8]\t$p[9]\t$p[10]\t$p[1 +1]\t$p[12]\n"; #------------- $fork_manager -> finish(0, + \$temp_out) # return $temp_out to parent }; # $fm_outer_1 -> finish() };};};};};};};};};}; $fork_manager -> wait_all_children(); $fm_outer_1 -> wait_all_children(); # fun with @param_arr..
The problem is that with the forking inside the last loop it only starts max 6 processes, because with the current settings each loop runs 6 times. Although being an improvement this is not what I aim for.
I tried putting the forking higher in the loop structure: it does run more procesess but it only returns the last combination of the loop.
I tried nested forking as suggested here https://www.perlmonks.org/?node_id=973304 (the commented out code with $fm_outer_1 variable) but it only issues and endless repetition of the warning:
Cannot start another process while you are in the child process at /us +r/.../ForkManager.pm line 467
and a few interspersed warnings of children not returining anything to the parent.
I tried setting up data retrival as for the $fork_manager and to shuffle around the positions of the wait_all_children() but without luck.

I resort to the Monks for help in either write a proper parallelization (I'm running on a 64 core server, so I'd like to use the power that's there ;) ) or, if exists, to point me to some package/module that can do the same thing in a bit more clean way that writing a bunch of nested loops.

I hope everything is clear!
Thanks!!!