Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Thread::Queue memory issue with nested maps but not foreach loops...

by jmmitc06 (Acolyte)
on Mar 03, 2012 at 02:21 UTC ( #957583=perlquestion: print w/ replies, xml ) Need Help??
jmmitc06 has asked for the wisdom of the Perl Monks concerning the following question:

Hi All:

I ran into some unusual perl behavior today at work and was wondering if anyone could explain why it occurs.

I was attempting to load a queue with a reasonable number of strings (200,000), with which I needed to do some work. I'm a big fan of using map in an anonymous context, so I loaded the queue inside of two nested map statements. Although this appeared to load the queue successfully and do the work I intended it to do, the amount of memory it used was very large, approximately 10,000MB with 1 thread. I then rewrote the code to load the queue inside of two nested foreach loops and only 800MB was used.

Can anyone explain this behavior, I know it has to do with how the queue is loaded and not with the work being done to the queued strings, for the following code snippets have the same behavior on my machine. The input is a flat file composed of .mol files.

Nested Maps (uses 10,000MB)

use MolFile; use threads; use Thread::Queue; my $database_compounds = ( MolFile->new( "File" => shift @ARGV )->pars +e_noHydrogens() ); my @names = ( keys %$database_compounds ); sub doWork { return; } ( our $THREADS, my $Qwork, my $Qresults ) = ( 1, new Thread::Queue, ne +w Thread::Queue ); my @thread_pool = map { threads->create( \&doWork, $Qwork, $Qresults ) + } 1..$THREADS; #---- map { my $i = $_; map { $Qwork->enqueue("$names[$_]!"."$names[$i]!") i +f $$database_compounds{$names[$i]}->Formula eq $$database_compounds{$ +names[$_]}->Formula) ; } ($i+1..$#names); } (0..$#names); #---- $Qwork->enqueue( (undef) x $THREADS ); map {$_->join();} @thread_pool; for (1..$THREADS) { while ( my $result = $Qresults->dequeue ) { print $result , "\n"; } }

Nested Foreach (uses 800MB)

use MolFile; use threads; use Thread::Queue; my $database_compounds = ( MolFile->new( "File" => shift @ARGV )->pars +e_noHydrogens() ); my @names = ( keys %$database_compounds ); sub doWork { return; } ( our $THREADS, my $Qwork, my $Qresults ) = ( 1, new Thread::Queue, ne +w Thread::Queue ); my @thread_pool = map { threads->create( \&doWork, $Qwork, $Qresults ) + } 1..$THREADS; #---- foreach my $i (0..$#names) { foreach my $j ($i+1..$#names) { if ( $$database_compounds{$names[$i]}->Formula eq $$database_compo +unds{$names[$j]}->Formula ) { my $string = "$names[$i]"."!"."$names[$j]"; $Qwork->enqueue($string); } } } #---- $Qwork->enqueue( (undef) x $THREADS ); map {$_->join();} @thread_pool; for (1..$THREADS) { while ( my $result = $Qresults->dequeue ) { print $result , "\n"; } }

These are identical to the program I was working with except the doWork subroutine is replaced with something to actually do work. I don't mind the foreach version, I would like to know why the nested maps produces the behavior?

Any ideas would be appreciated.

Thanks <\p>

Comment on Thread::Queue memory issue with nested maps but not foreach loops...
Select or Download Code
Re: Thread::Queue memory issue with nested maps but not foreach loops...
by BrowserUk (Pope) on Mar 03, 2012 at 04:08 UTC
    I would like to know why the nested maps produces the behavior?

    First. Take threads and Thread::Queue out of the equation. They are innocent bystanders in the issue.

    Using nested maps, this require 49MB and 11.4 seconds of cpu time to complete:

    C:\test>perl -E"$c=0; map map( ++$c, 1..1e3 ), 1..1e3; say 'check mem' +;<>" check mem 49 MB

    Whereas, this using nested for loops requires just 2 1/2 MB and 0.014 seconds of cpu:

    C:\test>perl -E"$c=0; for( 1..1e3 ) { ++$c for 1..1e3 }; say 'check me +m';<>" check mem 2.5MB

    For why,

    1. map operates on lists -- so 1 .. 1e3 builds a 1000 item list on "the stack" -- and nesting them means that 1000 lists of 1000 items need to be built.
    2. for will(*) process 1 .. 1e3 as an iterator, grabbing one value at a time as it needs it.

    (*)for will also build a list in some circumstances, but far less frequently.

    It pays to know (some of) the internal details of your language.

    (As an aside, filling Thread::Queues will huge numbers of items costs big in terms of memory and runtime. Better to limit how much you push into them at one time).


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Everything in your post makes sense to me; however, I have a single threaded version of the program that uses the same amount of memory using both the nested maps and the foreach loop. Also, aren't newer versions of perl optimized such that when map is used in a void context, the extra work that differentiates it from foreach and from for optimized away? (Source: http://www.perlmonks.org/index.pl?node_id=296742)

      The threaded version and the single threaded version are basically identical except instead of loading a queue, I call &doWork on the string. If the Thread::Queue was not part of the problem, shouldn't nested map use more memory than the nested foreach loops? From my testing, the outrageous use of memory only occurs when I am using Threads and Thread::Queue, otherwise the two perform similarly.

      Thankyou

        Also, aren't newer versions of perl optimized such that when map is used in a void context, the extra work that differentiates it from foreach and from for optimized away? (Source: http://www.perlmonks.org/index.pl?node_id=296742)

        The optimisation relating to map in a void context is that map doesn't build the return list (what would be returned from the map) when it detects it is called in a void context. It doesn't (I think "can't", but I not sure of that), stop the input list being built.

        From my testing, the outrageous use of memory only occurs when I am using Threads and Thread::Queue, otherwise the two perform similarly.

        Post -- or send me via the mail ID on my home page -- the real code. Then we'll see.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

      for (EXPR .. EXPR)

      is an iterator, but

      for (CONSTANT .. CONSTANT)

      is actually

      my @anon; BEGIN { @anon = (CONSTANT..CONSTANT); } for (@anon)

      Now, for (@anon) is treated more efficiently than generic lists, but I don't remember how so exactly.

        Sorry, but that simply cannot be true.

        At least not for large ranges:

        C:\test>perl -MTime::HiRes=time -E"$t=time; ++$c for 1..1e6; say time- +$t; <>" 0.0741260051727295 5.2 MB C:\test>perl -MTime::HiRes=time -E"$e=1e6;$t=time; ++$c for 1..$e; say + time-$t; <>" 0.0663371086120605 5.3 MB C:\test>perl -MTime::HiRes=time -E"$t=time; ++$c for 1..1e7; say time- +$t; <>" 0.635999917984009 5.2 MB C:\test>perl -MTime::HiRes=time -E"$e=1e7;$t=time; ++$c for 1..$e; say + time-$t; <>" 0.645999908447266 5.3 MB C:\test>perl -MTime::HiRes=time -E"$t=time; ++$c for 1..1e8; say time- +$t; <>" 6.22199988365173 5.2 MB C:\test>perl -MTime::HiRes=time -E"$e=1e8;$t=time; ++$c for 1..$e; say + time-$t; <>" 6.46099996566772 5.3 MB C:\test>perl -MTime::HiRes=time -E"$t=time; ++$c for 1..1e9; say time- +$t; <>" 61.9520001411438 5.2MB C:\test>perl -MTime::HiRes=time -E"$e=1e9;$t=time; ++$c for 1..$e; say + time-$t; <>" 64.4389998912811 5.3 MB

        There isn't any evidence for it at small range sizes:

        C:\test>perl -MTime::HiRes=time -E"$t=time; ++$c for 1..1e2; say time- +$t; <>" 2.09808349609375e-005 5.2 MB C:\test>perl -MTime::HiRes=time -E"$e=1e2;$t=time; ++$c for 1..$e; say + time-$t; <>" 1.9073486328125e-005 5.3 MB

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://957583]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2014-08-23 17:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (175 votes), past polls