http://www.perlmonks.org?node_id=957583

jmmitc06 has asked for the wisdom of the Perl Monks concerning the following question:

Hi All:

I ran into some unusual perl behavior today at work and was wondering if anyone could explain why it occurs.

I was attempting to load a queue with a reasonable number of strings (200,000), with which I needed to do some work. I'm a big fan of using map in an anonymous context, so I loaded the queue inside of two nested map statements. Although this appeared to load the queue successfully and do the work I intended it to do, the amount of memory it used was very large, approximately 10,000MB with 1 thread. I then rewrote the code to load the queue inside of two nested foreach loops and only 800MB was used.

Can anyone explain this behavior, I know it has to do with how the queue is loaded and not with the work being done to the queued strings, for the following code snippets have the same behavior on my machine. The input is a flat file composed of .mol files.

Nested Maps (uses 10,000MB)

use MolFile; use threads; use Thread::Queue; my $database_compounds = ( MolFile->new( "File" => shift @ARGV )->pars +e_noHydrogens() ); my @names = ( keys %$database_compounds ); sub doWork { return; } ( our $THREADS, my $Qwork, my $Qresults ) = ( 1, new Thread::Queue, ne +w Thread::Queue ); my @thread_pool = map { threads->create( \&doWork, $Qwork, $Qresults ) + } 1..$THREADS; #---- map { my $i = $_; map { $Qwork->enqueue("$names[$_]!"."$names[$i]!") i +f $$database_compounds{$names[$i]}->Formula eq $$database_compounds{$ +names[$_]}->Formula) ; } ($i+1..$#names); } (0..$#names); #---- $Qwork->enqueue( (undef) x $THREADS ); map {$_->join();} @thread_pool; for (1..$THREADS) { while ( my $result = $Qresults->dequeue ) { print $result , "\n"; } }

Nested Foreach (uses 800MB)

use MolFile; use threads; use Thread::Queue; my $database_compounds = ( MolFile->new( "File" => shift @ARGV )->pars +e_noHydrogens() ); my @names = ( keys %$database_compounds ); sub doWork { return; } ( our $THREADS, my $Qwork, my $Qresults ) = ( 1, new Thread::Queue, ne +w Thread::Queue ); my @thread_pool = map { threads->create( \&doWork, $Qwork, $Qresults ) + } 1..$THREADS; #---- foreach my $i (0..$#names) { foreach my $j ($i+1..$#names) { if ( $$database_compounds{$names[$i]}->Formula eq $$database_compo +unds{$names[$j]}->Formula ) { my $string = "$names[$i]"."!"."$names[$j]"; $Qwork->enqueue($string); } } } #---- $Qwork->enqueue( (undef) x $THREADS ); map {$_->join();} @thread_pool; for (1..$THREADS) { while ( my $result = $Qresults->dequeue ) { print $result , "\n"; } }

These are identical to the program I was working with except the doWork subroutine is replaced with something to actually do work. I don't mind the foreach version, I would like to know why the nested maps produces the behavior?

Any ideas would be appreciated.

Thanks <\p>