#===================================
sub _REAPER {
#===================================
my $params = shift;
foreach my $pid ( keys %Children ) {
my $res = waitpid( $pid, WNOHANG );
if ( $res > 0 ) {
$Children{$pid} = 0;
die "Error in child" if $?;
}
}
$SIG{'CHLD'} = \&_REAPER;
}
Note, in the reaper, I set $Children{$pid} = 0 instead of deleting the key, as that was causing panic: freed scalar errors. I now clean up the %Children hash in the main loop of the parent.
The error I'm seeing is at the stage in the parent when I'm reading the 5,000 records from the source
thanks
Clint | [reply] [d/l] [select] |
At the suggestion of moritz, I ran the script with strace, the relevant bits of which are as follows:
Here is where the parent child makes the request:
At this stage, my code catches the select failed: no child processes error in an eval, issues a warning, then sleeps before retrying:
I'm not sure what most of this means, but is the value of $! being set to "no child processes" by one of my waitpid calls, which is interfering with the code in LWP::Protocol::http? Would it help if I localised $! in my reaper sub?
| [reply] [d/l] [select] |
| [reply] |
select(8, [3], NULL, NULL, {172, 0}) = ? ERESTARTNOHAND (To be rest
+arted)
--- SIGCHLD (Child exited) @ 0 (0) ---
sigreturn() = ? (mask now [])
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
waitpid(14232, 0xbfb45be8, WNOHANG) = 0
waitpid(14233, 0xbfb45be8, WNOHANG) = 0
waitpid(14225, 0xbfb45be8, WNOHANG) = -1 ECHILD (No child processe
+s)
...
My interpretation of this would be (as you already figured) that $!
is being modified in the signal handler before the interrupted select call
gets a chance to be restarted, i.e. the redo SELECT doesn't execute
because of that very modification of $!.
(Note that because of Perl's deferred (aka safe) signal handling,
the sigreturn() (which is being called at the end of the "real" system/C-level
signal handler) happens immediately, before the Perl signal handler
runs all the waitpid calls. Still, they do run before the
next Perl opcode executes (which means this is presumably before if ($!{EINTR} || $!{EAGAIN}) ).
What I find a little surprising is that the ECHILD does occur at
all, because your $Children{$pid} should've been set to zero in
the previous call to the signal handler
waitpid(14225, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 142
+25
where the waitpid did return 14225 (i.e. $res > 0). In other words, you shouldn't be
calling waitpid(14225,...) again thereafter, because the 14225 is no longer supposed to be in the hash... (update: err wait, this is nonsense of course, as you're iterating over the keys, not the values. OTOH, this brings up the question what would happen if you did set the values to the PIDs, too, and then iterate over the values instead (as you seem be to getting that panic when deleting the keys...)
Maybe you could try to figure out why this is — in addition
to trying to localize $! as a workaround, of course.
| [reply] [d/l] [select] |
local'ising $! seems to have sorted out that issue, revealing the real error that is happening on the remote process.
Re your other point, yes - deleting keys in the hash causes a panic, but I'll change the loop to only waitpid to those keys that have true values, which should help
thanks
| [reply] [d/l] [select] |