Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: No child processes - system limit?

by ikegami (Pope)
on Apr 01, 2010 at 16:52 UTC ( #832343=note: print w/ replies, xml ) Need Help??


in reply to No child processes - system limit?

Do you have any signal handlers?

Are you using fork, system, threads or some means of parallelising?


Comment on Re: No child processes - system limit?
Select or Download Code
Re^2: No child processes - system limit?
by clinton (Priest) on Apr 01, 2010 at 17:01 UTC
    Yes - in the parent process, I'm reading 5000 records from a source, then forking off a child to reindex each of those 5000 records. The parent forks $max_kids processes, recording the PIDs in a hash, then waits until there are fewer than $max_kids active.

    My reaper looks like this:

    #=================================== sub _REAPER { #=================================== my $params = shift; foreach my $pid ( keys %Children ) { my $res = waitpid( $pid, WNOHANG ); if ( $res > 0 ) { $Children{$pid} = 0; die "Error in child" if $?; } } $SIG{'CHLD'} = \&_REAPER; }

    Note, in the reaper, I set $Children{$pid} = 0 instead of deleting the key, as that was causing panic: freed scalar errors. I now clean up the %Children hash in the main loop of the parent.

    The error I'm seeing is at the stage in the parent when I'm reading the 5,000 records from the source

    thanks

    Clint

Re^2: No child processes - system limit?
by clinton (Priest) on Apr 01, 2010 at 17:45 UTC
    At the suggestion of moritz, I ran the script with strace, the relevant bits of which are as follows:

    Here is where the parent child makes the request:

    At this stage, my code catches the select failed: no child processes error in an eval, issues a warning, then sleeps before retrying:

    I'm not sure what most of this means, but is the value of $! being set to "no child processes" by one of my waitpid calls, which is interfering with the code in LWP::Protocol::http? Would it help if I localised $! in my reaper sub?

      Would it help if I localised $! in my reaper sub?

      I believe so. That's exactly where I was going with my question.

      select(8, [3], NULL, NULL, {172, 0}) = ? ERESTARTNOHAND (To be rest +arted) --- SIGCHLD (Child exited) @ 0 (0) --- sigreturn() = ? (mask now []) rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0 waitpid(14232, 0xbfb45be8, WNOHANG) = 0 waitpid(14233, 0xbfb45be8, WNOHANG) = 0 waitpid(14225, 0xbfb45be8, WNOHANG) = -1 ECHILD (No child processe +s) ...

      My interpretation of this would be (as you already figured) that $! is being modified in the signal handler before the interrupted select call gets a chance to be restarted, i.e. the redo SELECT doesn't execute because of that very modification of $!.

      (Note that because of Perl's deferred (aka safe) signal handling, the sigreturn() (which is being called at the end of the "real" system/C-level signal handler) happens immediately, before the Perl signal handler runs all the waitpid calls. Still, they do run before the next Perl opcode executes (which means this is presumably before if ($!{EINTR} || $!{EAGAIN}) ).

      What I find a little surprising is that the ECHILD does occur at all, because your $Children{$pid} should've been set to zero in the previous call to the signal handler

      waitpid(14225, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 142 +25

      where the waitpid did return 14225 (i.e. $res > 0). In other words, you shouldn't be calling waitpid(14225,...) again thereafter, because the 14225 is no longer supposed to be in the hash...  (update: err wait, this is nonsense of course, as you're iterating over the keys, not the values.  OTOH, this brings up the question what would happen if you did set the values to the PIDs, too, and then iterate over the values instead (as you seem be to getting that panic when deleting the keys...)

      Maybe you could try to figure out why this is — in addition to trying to localize $! as a workaround, of course.

        local'ising $! seems to have sorted out that issue, revealing the real error that is happening on the remote process.

        Re your other point, yes - deleting keys in the hash causes a panic, but I'll change the loop to only waitpid to those keys that have true values, which should help

        thanks

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://832343]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2014-08-30 03:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (291 votes), past polls