Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Parallel::ForkManager run_on_finish() bug?

by bot403 (Beadle)
on May 03, 2012 at 20:08 UTC ( #968806=perlquestion: print w/replies, xml ) Need Help??
bot403 has asked for the wisdom of the Perl Monks concerning the following question:

Using Parallel::ForkManager I had a piece of code with a run_on_finish() method that was never executed when I tried to run with the maximum number of proces set higher than 0. If the max_procs was 0 (disabling the fork() and using some emulation) than the run_on_finish callback ran just fine.

After much head scratching and source reading it looks like P:FM doesn't install any signal handling to actually catch SIG{CHLD} and thus run the run_on_finish() callback in the parent. It does try to call waitpid() sometimes but really only when it starts a new process (start() method).

I believe that P:FM is losing its children to the default SIG{CHLD} handler. However, the docs don't state that anything special is neccesary to do use run_on_finish().

The code below fixed my issue.

my $pm = Parallel::ForkManager->new(2); $SIG{CHLD} = sub{ Parallel::ForkManager::wait_children($pm) };

Shouldn't P:FM install a signal handler for SIGCHLD? Also, how does the out of the box example ( in source) work at all without a signal handler?

I am not (to my knowledge) installing signal handlers or otherwise calling fork besides the odd backtick.


EDIT: Problem re-production. The code works with max_procs set to 0 but not any higher than 0 due to run_on_finish() never being called

In this code I'm not doing any crazy process management. I do manage my workload but I expect run_on_finish() when a process finished to be called per the docs. The key point here is run_on_finish() generates new work. If I call 'ps' I see that the children are getting cleaned up by the system but run_on_finish() was never called.

#!/usr/bin/env perl -w use strict; use Parallel::ForkManager; use 5.012; my $max_procs = 5; my @name_queue = qw( bot403 ); my @future_names = qw( Fred Jim Lily Steve Jessica Bob Dave Christine +Rico Sara ); my $pm = new Parallel::ForkManager($max_procs); # Track children in progress my %working; # Setup a callback for when a child finishes up so we can # get it's exit code $pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; print "** $ident just got out of the pool ". "with PID $pid and exit code: $exit_code\n"; my $new = shift @future_names; # Done with this work unit. shift @name_queue; # Get future work to do (oversimplification of my real problem) push @name_queue, $new if $new; # Mark this entry as no longer in progress delete $working{$ident}; } ); $pm->run_on_start( sub { my ($pid,$ident)=@_; print "** $ident started, pid: $pid\n"; } ); $pm->run_on_wait( sub { print "** Have to wait for one children ...\n" }, 10 ); while( @name_queue || scalar keys %working ){ # Screen out entries in progress so we don't duplicate work. my @to_process = grep { ! $working{$_} } @name_queue; if( !@to_process ){ say "Waiting for children to finish to find new names"; sleep 10; next; } foreach my $child (@to_process){ $working{$child} = 1; $pm->start($child) and next; # This code is the child process say "This is $child"; sleep 3+rand(5); $pm->finish(0); # pass an exit code to finish } } print "Waiting for Children...\n"; $pm->wait_all_children; print "Everybody is out of the pool!\n";

Replies are listed 'Best First'.
Re: Parallel::ForkManager run_on_finish() bug?
by runrig (Abbot) on May 03, 2012 at 20:49 UTC

    Is this on Windows? There shouldn't be any $SIG{CHLD} handler necessary, as P::FM uses waitpid to determine when a process ends and whether to call run_on_finish(). I'm not sure if that would work on Windows.

    Or is there some other code calling wait() or waitpid()?

    Are you calling wait_all_children() after you're done?

      This is on AIX. I do call wait_all_children() at the end but due to the way I'm performing work and receiving new work I'm neither waiting on $pm->start nor wait_all_children() so waitpid() in P:FM isn't called. I don't have new work for my classic for() loop until children finish and run_on_finish() is called. Basically run_on_finish() is my only way to get new work.

      My code does not call wait() or waitpid(). I don't think I can post my code but let me see if I can get a reproducible small test case.

Re: Parallel::ForkManager run_on_finish() bug?
by bot403 (Beadle) on May 04, 2012 at 13:40 UTC

    After more head-scratching I suppose my code is asking too much. I expect run_on_finish() to act asynchronously like the signal but its becoming obvious P:FM relies on being inside P::FM::start() or P::FM::wait_all_children() to execute its callbacks.

    I would settle for a documentation update that clarifies that run_on_finish() is not asynchronous unless you link it to SIG{CHLD}. An optional flag to run_on_finish() that installs the SIG{CHLD} handler for you would be nice though.

      wait_one_child is an undocumented method of P:FM, but it seems like that would be useful in your case. An attempt at psueudo-code:
      while ($stuff_to_do) { for my $job (@job_list) { $pm->start($job) and next; $pm->finish(); } if ($still_doing_stuff) { # run_on_finish might push more stuff to @job_list $pm->wait_one_child(); } }
        How would you use it the example code given by anshumangoyal above?
        I have the same problem on perl 5.18 and P:FM 1.03.
        I have the same code running on production for years on 5.10 and P:FM 0.7.5 without any issues.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://968806]
Approved by Corion
Front-paged by mwp
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2018-05-23 11:35 GMT
Find Nodes?
    Voting Booth?