Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Parallel ForkManager error with run_on_wait()

by sojourn548 (Acolyte)
on May 16, 2013 at 07:37 UTC ( #1033780=perlquestion: print w/replies, xml ) Need Help??
sojourn548 has asked for the wisdom of the Perl Monks concerning the following question:

When using Parallel ForkManager 0.7.5 (and the latest version 1.03) with run_on_wait(), I am seeing this error before the script terminates abnormally:

Use of uninitialized value in block exit at /lib/site_perl/5.8.7/Paral +lel/ line 365. Use of uninitialized value in block exit at /lib/site_perl/5.8.7/Paral +lel/ line 365. Unable to create sub named "" at /lib/site_perl/5.8.7/Parallel/ForkMan line 365. Good-Bye...

This is a constantly running process that seems to error out with the same message about once every 10 days or so, and it started occurring after I updated the script to use run_on_wait(). run_on_wait() is used to call a routine every second, and keeps track of forked processes and sends TERM signal to those child processes that have been running longer than 4 seconds. I am unsure how to go about debugging this, as this error seems to occur in rare occasions. I appreciate taking the time to review this, and thanks in advance.

our %procs; use constant LIMIT => 4 $pm->run_on_wait(\&term_process, 1); $pm->run_on_finish( sub { my ($pid, $exit_code, $ident) = @_; my ($check_id, $host) = $ident =~ /^(.*?) on (.*)/s; print("run_on_finish: $ident (pid: $pid) exited with code: [$e +xit_code] host: [$host]\n"); delete $procs{$pid}; print("proc_mgmt: $ident: deleting (pid: $pid) from list\n"); } ); $pm->run_on_start( sub { my ($pid,$ident)=@_; print("** $ident started, pid: $pid\n"); $procs{$pid} = time(); } ); sub term_process{ my $debug_time; my $total_time; while (my ($pid, $started_at) = each %procs) { next unless time() - $started_at > LIMIT; $debug_time = time(); $total_time = $debug_time - $started_at; print("[$pid] hung. time now: [$debug_time] - [$started_at] = [$t +otal_time] sending KILL."); kill TERM => $pid; delete $procs{$pid}; } }

Looking at

357 sub run_on_wait { my ($s,$code, $period)=@_; 358 $s->{on_wait}=$code; 359 $s->{on_wait_period} = $period; 360 } 361 362 sub on_wait { my ($s)=@_; 363 if(ref($s->{on_wait}) eq 'CODE') { 364 $s->{on_wait}->(); 365 if (defined $s->{on_wait_period}) { 366 local $SIG{CHLD} = sub { } if ! defined $SIG{CHLD}; 367 select undef, undef, undef, $s->{on_wait_period} 368 }; 369 }; 370 };

Replies are listed 'Best First'.
Re: Parallel ForkManager error with run_on_wait() (in 10 year old version)
by Anonymous Monk on May 16, 2013 at 08:57 UTC

    When using Parallel ForkManager 0.7.5 ... and maybe I discovered a bug?

    Try the latest, try  cpan SZABGAB/Parallel-ForkManager-1.03.tar.gz

    0.7.5 is over 10 years old

      That was one of my suspicions, so before I posted on PerlMonks, I downloaded and viewed the source for Parallel-ForkManager-1.03. The subs run_on_wait() and run_wait() hasn't changed, and I did not see any bug reports/changes related to the errors that I was seeing.. So I decided to post here to see if it was something that I was doing incorrectly.

        I downloaded and viewed the source

        How about you try running the code?

Re: Parallel ForkManager error with run_on_wait()
by Anonymous Monk on Dec 12, 2013 at 10:29 UTC

    After having a look at the code and testing further, I think that the problem is the local scope of the $SIG{CHILD} in ForkManager's sub wait_on. Obviously, the empty subroutine is set in order to yank the process free of the following sleep (select) statement, otherwise IGNORE/undef would do just fine. When the process leaves the sub on_wait, then $SIG{CHILD} is reset to undef (likely meaning IGNORE). However, if more signals hit the process at just the right time (when exiting the sub/resetting $SIG{CHILD}), then the error is triggered. I can reproduce it outside ForkManager, so it is nothing special in ForkManager.

    This error should only occur in forkmanager if the user has not set $SIG{CHILD}.

    I see two possible solutions. Either remove the "local" clause on $SIG{CHLD} in ForkManager sub wait_on, OR simply set the following before you call ForkManager:

    $SIG{CHLD} = sub { };

    Best regards, /Bjarne