Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Parallel::ForkManager run_on_finish() bug?

by bot403 (Beadle)
on May 03, 2012 at 20:08 UTC ( #968806=perlquestion: print w/ replies, xml ) Need Help??
bot403 has asked for the wisdom of the Perl Monks concerning the following question:

Using Parallel::ForkManager I had a piece of code with a run_on_finish() method that was never executed when I tried to run with the maximum number of proces set higher than 0. If the max_procs was 0 (disabling the fork() and using some emulation) than the run_on_finish callback ran just fine.

After much head scratching and source reading it looks like P:FM doesn't install any signal handling to actually catch SIG{CHLD} and thus run the run_on_finish() callback in the parent. It does try to call waitpid() sometimes but really only when it starts a new process (start() method).

I believe that P:FM is losing its children to the default SIG{CHLD} handler. However, the docs don't state that anything special is neccesary to do use run_on_finish().

The code below fixed my issue.

my $pm = Parallel::ForkManager->new(2); $SIG{CHLD} = sub{ Parallel::ForkManager::wait_children($pm) };

Shouldn't P:FM install a signal handler for SIGCHLD? Also, how does the out of the box example (callback.pl in source) work at all without a signal handler?

I am not (to my knowledge) installing signal handlers or otherwise calling fork besides the odd backtick.

Thoughts?

EDIT: Problem re-production. The code works with max_procs set to 0 but not any higher than 0 due to run_on_finish() never being called

In this code I'm not doing any crazy process management. I do manage my workload but I expect run_on_finish() when a process finished to be called per the docs. The key point here is run_on_finish() generates new work. If I call 'ps' I see that the children are getting cleaned up by the system but run_on_finish() was never called.

#!/usr/bin/env perl -w use strict; use Parallel::ForkManager; use 5.012; my $max_procs = 5; my @name_queue = qw( bot403 ); my @future_names = qw( Fred Jim Lily Steve Jessica Bob Dave Christine +Rico Sara ); my $pm = new Parallel::ForkManager($max_procs); # Track children in progress my %working; # Setup a callback for when a child finishes up so we can # get it's exit code $pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; print "** $ident just got out of the pool ". "with PID $pid and exit code: $exit_code\n"; my $new = shift @future_names; # Done with this work unit. shift @name_queue; # Get future work to do (oversimplification of my real problem) push @name_queue, $new if $new; # Mark this entry as no longer in progress delete $working{$ident}; } ); $pm->run_on_start( sub { my ($pid,$ident)=@_; print "** $ident started, pid: $pid\n"; } ); $pm->run_on_wait( sub { print "** Have to wait for one children ...\n" }, 10 ); while( @name_queue || scalar keys %working ){ # Screen out entries in progress so we don't duplicate work. my @to_process = grep { ! $working{$_} } @name_queue; if( !@to_process ){ say "Waiting for children to finish to find new names"; sleep 10; next; } foreach my $child (@to_process){ $working{$child} = 1; $pm->start($child) and next; # This code is the child process say "This is $child"; sleep 3+rand(5); $pm->finish(0); # pass an exit code to finish } } print "Waiting for Children...\n"; $pm->wait_all_children; print "Everybody is out of the pool!\n";

Comment on Parallel::ForkManager run_on_finish() bug?
Select or Download Code
Replies are listed 'Best First'.
Re: Parallel::ForkManager run_on_finish() bug?
by runrig (Abbot) on May 03, 2012 at 20:49 UTC

    Is this on Windows? There shouldn't be any $SIG{CHLD} handler necessary, as P::FM uses waitpid to determine when a process ends and whether to call run_on_finish(). I'm not sure if that would work on Windows.

    Or is there some other code calling wait() or waitpid()?

    Are you calling wait_all_children() after you're done?

      This is on AIX. I do call wait_all_children() at the end but due to the way I'm performing work and receiving new work I'm neither waiting on $pm->start nor wait_all_children() so waitpid() in P:FM isn't called. I don't have new work for my classic for() loop until children finish and run_on_finish() is called. Basically run_on_finish() is my only way to get new work.

      My code does not call wait() or waitpid(). I don't think I can post my code but let me see if I can get a reproducible small test case.

Re: Parallel::ForkManager run_on_finish() bug?
by bot403 (Beadle) on May 04, 2012 at 13:40 UTC

    After more head-scratching I suppose my code is asking too much. I expect run_on_finish() to act asynchronously like the signal but its becoming obvious P:FM relies on being inside P::FM::start() or P::FM::wait_all_children() to execute its callbacks.

    I would settle for a documentation update that clarifies that run_on_finish() is not asynchronous unless you link it to SIG{CHLD}. An optional flag to run_on_finish() that installs the SIG{CHLD} handler for you would be nice though.

      wait_one_child is an undocumented method of P:FM, but it seems like that would be useful in your case. An attempt at psueudo-code:
      while ($stuff_to_do) { for my $job (@job_list) { $pm->start($job) and next; $pm->finish(); } if ($still_doing_stuff) { # run_on_finish might push more stuff to @job_list $pm->wait_one_child(); } }
        How would you use it the example code given by anshumangoyal above?
        I have the same problem on perl 5.18 and P:FM 1.03.
        I have the same code running on production for years on 5.10 and P:FM 0.7.5 without any issues.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://968806]
Approved by Corion
Front-paged by mwp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (12)
As of 2015-07-28 22:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (260 votes), past polls