Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

threads: work crew memory leak

by rakzer (Novice)
on Oct 16, 2010 at 14:23 UTC ( #865671=perlquestion: print w/ replies, xml ) Need Help??
rakzer has asked for the wisdom of the Perl Monks concerning the following question:

The following code should always run a maximum of 20 threads processing some data. Running the code while looking at the memory consumption with top shows that the perl process keeps using more and more memory. I am using perl 5.12.2 with threads 1.75 under FreeBSD 8.1 (64bit). I am not sure where the memory leaks. Am I missing something?
#!/usr/bin/perl $|=1; use strict; use warnings; use threads; use threads::shared; use Data::Dumper; # Global vars # Maximum working threads my $MAX_THREADS = 20; # Flag to inform all threads that application is terminating my $TERM:shared=0; # Prevents double detach attempts my $DETACHING:shared; # Signal handling $SIG{'INT'} = $SIG{'TERM'} = sub { print("^C captured\n"); $TERM=1; }; # thread sub stuff_thr($) { my ($job)=@_; # My thread ID my $tid=threads->tid(); # do some thread stuff print "Hi, I am thread: $tid, I need to do something with $job\n"; # Detach and terminate { lock($DETACHING); threads->detach() if ! threads->is_detached(); } return(0); } # main sub main() { # Manage the thread pool until we run out of data or signalled # to terminate my @jobs=(1..100); while (@jobs && ! $TERM) { # Keep max threads running for (my $needed = $MAX_THREADS - threads->list(); $needed && ! $TERM; $needed--) { my $job = shift(@jobs); last if (! $job); # New thread threads->create('stuff_thr',$job); } # normally fetch a limited amount of data from a db to # process, at this point just make sure the job queue # is never empty if(scalar(@jobs) < 10) { @jobs=(1..100); } } while ((threads->list() > 0)) { # waiting for threads to finish sleep(1); } } # enter main() main();

Comment on threads: work crew memory leak
Download Code
Re: threads: work crew memory leak
by zwon (Monsignor) on Oct 16, 2010 at 15:11 UTC
Re: threads: work crew memory leak
by BrowserUk (Pope) on Oct 16, 2010 at 15:52 UTC
    1. On Vista 64-bit / perl 5.10.1 / threads v1.76 / threads::shared v1.33 I see no signs of a memory leak.

      What versions of those modules do you have?

    2. Having run for 100,000+ thread creation/deleteion cycles, the memory usage wobbles a bit but stays pretty much fixed around 35 or 35 MB mark. The occasional spikes just mean that at the instance that value was taken more threads had been created than destroyed in the last second or two. A situation that corrects itself immediately.

      An app that creates 100,000 threads in 15 minutes, each to handle 1 number, is really badly designed. Like building a new train for every journey to work. Inefficient and unsustainable.

    3. All your shenanigans with a semaphore and detach and is_detached() are utterly redundant.

      Once you detach the thread, it ends. There is no possibility of attempting to "do it twice". And if you did attempt to detach the same thread twice, it does do no harm at all.

      This code functions identically with more clarity:

      #!/usr/bin/perl use strict; use warnings; use threads; use threads::shared; use Data::Dumper; $|=1; my $MAX_THREADS = 20; my $TERM :shared = 0; $SIG{'INT'} = $SIG{'TERM'} = sub { print("^C captured\n"); $TERM=1; }; sub stuff_thr($) { my ($job)=@_; my $tid=threads->tid(); print "Hi, I am thread: $tid, I need to do something with $job\n"; } sub main() { my @jobs = ( 1 .. 100 ); while (@jobs && ! $TERM) { for( 1 .. $MAX_THREADS - threads->list() ) { my $job = shift(@jobs); last if (! $job); threads->create('stuff_thr',$job)->detach; } @jobs = ( 1 .. 100 ) if @jobs < 10 } sleep 1 while threads->list() > 0; } main();

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I am using perl v5.12.2, threads v1.75, threads::shared v1.32.

      My shenanigans with the semaphores and the detaching was a desperate attempt to plug the memory leak but of course you're right, it was useless. As others replied this could be an underlying problem with pthread. For now I'll rewrite the code to reuse threads.
        I am using perl v5.12.2, threads v1.75, threads::shared v1.32.

        It might be worth your while trying threads v1.81 & threads::shared v1.34.

        As others replied this could be an underlying problem with pthread.

        *nix/pthreads isn't something I know much about, but my gut feel is this is more likely to be a problem with (perl) memory allocation than the underlying threading libraries.

        One thing that should be done, if this is going to get fixed, is to find a minimal test case that demonstrates the problem and a bug report raised. Something like:

        perl -Mthreads -wE"{sleep 1 while threads->list>20;async(sub{1})->deta +ch;redo}"

        Assuming that actually leaks on your system, because the memory usage is rock solid at 12.5MB here.

        For now I'll rewrite the code to reuse threads.

        Once you do, you'll never go back to using throw-away threads.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      An app that creates 100,000 threads in 15 minutes, each to handle 1 number, is really badly designed. Like building a new train for every journey to work. Inefficient and unsustainable.
      In general (and in the specific case) I agree with you, however, threads are alot less useful on today's platforms due to their high overhead.

      The idea of creating a separate thread for each instance of a non-progressive (each successive loop not dependent on previous) forloop to do each calculation on independent terms is horribly limited by the high overhead of threads.

      Conceivably, EVEN a single addition of 100,000 variables on a 100,000 processor system could have a 100,000X speedup if thread creation overhead was 0.

      Such a large gap between theory and practice...*sigh*....

        Conceivably, EVEN a single addition of 100,000 variables on a 100,000 processor system could have a 100,000X speedup if thread creation overhead was 0.

        Sorry, but that's just very naive.

        1. Firstly, even in C or assembler, the creation overhead can never be 0.

          The creation of a (kernel) thread requires, at minimum:

          • the allocation of a stack segment.
          • the allocation of a register set save area.
          • the allocation of a thread 'context' structure to hold stuff like priorities, permission etc.
          • linking that context into the scheduler dispatch queues and other control structures.

          And each of those requires a transition from ring3 user space to ring 0 kernel space, which costs about 800 clock cycles on its own.

        2. Each time a thread runs, it requires that:
          • the current register contents be saved to that threads context structure;
          • the new threads saved register set be loaded into the registers;
          • the processor pipelines need to be flushed.
          • the scheduler queues and tables need to be updated.
          • it's almost inevitable that some L1/l2/l3 cache lines will need to be flushed to ram and reloaded.

          All of those will mean hundreds if not thousands of cycles overhead.

        A single addition of 2 numbers happens in 1 clock cycle. Spawning a new thread for each addition of 100,000 numbers even if you had 100,000 cores on your processor--which isn't going to happen any time in the next 10 years, if at all--will take far longer than just looping over the whole 100,000 on a single thread. And that's in C, never mind an interpreted language like Perl.

        However, if we get back into the real world and consider processing those 100,000 additions on a 4 or 16 or 64 core system, then starting 1 thread per core and having each operate on 100,000/cores values, it's quite likely that you can achieve real (and realistic) economies.

        Of course, the best approach today and in the immediate future, would be to load the data onto a Tesla GPU, and let it's 448 cores loose on it in parallel.

        threads are alot less useful on today's platforms due to their high overhead.

        I'd love to know what platforms you're talking about where the cost of threading was cheaper?

        The only ones I'm aware of that might qualify are things like Java 1.1's 'green' threads. Whilst they were cheap to spawn--as they run completely in user space, so avoiding ring-level transitions--they were practically useless in as much as they only emulate true multi-tasking, by embedding a scheduler within the VM. As such they don't scale across cores, so when any one green thread is running, all the others are stopped. So there is no performance gain for cpu-bound processing.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://865671]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (9)
As of 2014-08-22 07:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (149 votes), past polls