http://www.perlmonks.org?node_id=1232781


in reply to Parallelization of multiple nested loops

I was able to produce the output file in just over 3 minutes on a crappy laptop. It's not parallel, and is not as simple as BrowserUKs suggested module, and certainly not as screamingly fast as marioroys. But I wanted to share it in case someone finds it interesting. It should be straightforward to use Getopt::Std to make the values and number levels command line options.

#!/usr/bin/perl use warnings; use strict; # values my @val = qw(0.0 0.2 0.4 0.6 0.8 1.0); my $tiers = 11; # map array indices to values my $m = {}; { my $i = int 0; map { $m->{$i++} = $_ } @val; } # first tier my $p = \@val; # create each additional tier skipping first # that's already in $p for (my $i = 2; $i <= $tiers; $i++) { my $tmp; map { $tmp->[$_] = $p; } keys %{$m}; $p = $tmp; } # output file open(my $outfile, '>', '/tmp/output.txt') or die $!; # use recursion to decend the huge matrix # build up the string at each tier my $fn; $fn = sub { my ($aref, $str) = @_; for (my $i = int 0; $i < @{$aref}; $i++) { if(ref($aref->[$i])) { $fn->($aref->[$i], $str."\t".$val[$i]); next; } # end of the line, print last tier of values print $outfile $str."\t".$_."\t1\t1\n" for @val; last; } }; # kick off the recursion, could do these in parallel # at the top-most layer for (my $i = int 0; $i < @{$p}; $i++) { $fn->($p->[$i], $val[$i]); }

Unsurprisingly the I/O seems to take up a good deal of the time. It's a 17.4GB file with 362797056 lines, but perl only seems to take about 5MB of resident memory (27MB virtual) while running. I certainly wouldn't want to keep the output in memory, but the array-refs would be just fine to pass around.

So depending on what else is being done, and how many times the parameters are changed, it might make sense to just hold the initial huge matrix of array-refs in memory and pull combinations off for further processing in batches.

Replies are listed 'Best First'.
Re^2: Parallelization of multiple nested loops
by marioroy (Prior) on Apr 29, 2019 at 00:09 UTC

    Hi, cmk

    Very cool! Seeing my old parallel code made me want to try again for a 100% pure Perl solution not involving Inline::C. So first serial and parallel code afterwards. Workers consume about 16 MB each. The parallel code runs 3 times faster compared to serial. This is made possible by having workers write to STDOUT directly.

    $ time perl serial.pl >/dev/null $ time perl parallel.pl >/dev/null

    Serial Demonstration

    use strict; use warnings; my @vals = qw( 0.0 0.2 0.4 0.6 0.8 1.0 ); sub proc { for my $a ( @vals ) { for my $b ( @vals ) { for my $c ( @vals ) { for my $d ( @vals ) { for my $e ( @vals ) { for my $f ( @vals ) { for my $g ( @vals ) { for my $h ( @vals ) { for my $i ( @vals ) { for my $j ( @vals ) { for my $k ( @vals ) { print "$a\t$b\t$c\t$d\t$e\t$f\t$g\t$h\t$i\t$j\t$k\t1\t1\n"; }}}}}}}}}}} } proc();

    Parallel Demonstration

    use strict; use warnings; use MCE; my @vals = qw( 0.0 0.2 0.4 0.6 0.8 1.0 ); # Must autoflush because workers write to STDOUT directly. STDOUT->autoflush(1); sub proc { my $mce = MCE->new( max_workers => scalar(@vals), chunk_size => 1, init_relay => 1, user_func => sub { my ($a, $b, $c) = @{ MCE->user_args }; my ($buf, $d ) = ( '', $_ ); # $d is the input for my $e ( @vals ) { for my $f ( @vals ) { for my $g ( @vals ) { for my $h ( @vals ) { for my $i ( @vals ) { for my $j ( @vals ) { for my $k ( @vals ) { $buf .= "$a\t$b\t$c\t$d\t$e\t$f\t$g\t$h\t$i\t$j\t$k\t1\t1\n"; }}}}}}} # Relay is driven by the chunk_id value behind the scene. # The benefit is orderly output, one worker at a time. MCE::relay { print $buf }; } )->spawn; for my $a ( @vals ) { for my $b ( @vals ) { for my $c ( @vals ) { # MCE workers persist between each run. The user_args option # is how to pass parameters to them. $mce->process({ input_data => \@vals, user_args => [ $a, $b, $c ], }); } } } $mce->shutdown; } proc();

    Parallel happens at the 4th level to minimize memory consumption.

    Regards, Mario