Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re^5: Best way to kill a child process

by Marshall (Abbot)
on Sep 22, 2011 at 00:50 UTC ( #927257=note: print w/replies, xml ) Need Help??

in reply to Re^4: Best way to kill a child process
in thread Best way to kill a child process

I think we also do agree on the potential side issues related to portability :)


I was thinking about a signal problem that a guy had about 6 months ago related to Apple OS X. We were using C and not Perl.

Your one line of code will work at least 99% of the time!
My 2 lines of code may work at a higher probability, but I don't think that it matters at all!

The "right way" to deal with this is to have a CHLD signal handler. And either set that thing to "IGNORE' or a coderef to a subroutine that causes a waitpid loop. $SIG{CHLD}='IGNORE'; is far superior to doing nothing with the CHLD signal.

Replies are listed 'Best First'.
Re^6: Best way to kill a child process
by flexvault (Monsignor) on Oct 10, 2011 at 15:16 UTC

    Hi Marshall

    . . . a coderef to a subroutine that causes a waitpid loop. . .

    Just something to think about! I recently (past year) removed all uses of waitpid loops, since on some recent versions of AIX/Unix/Linux, especially on multi-core computers, if the child had been reaped by another core, the SIG handler hanged forever. I replace the code in the parent with:

    if ( kill 0 => $child ) { $children++; ... } else { my $ret = &make_child( ); ... }

    It works, but like you I prefer the sub. I don't know if this behavior is a bug, or it its intentional. The problem doesn't seem to happen it the child exists, only when it has been reaped in a previous call.

    I commented out the previous use, may need it again :-)

    Thank you

    "Well done is better than well said." - Benjamin Franklin

      This is odd and I don't know how to replicate this. If you can replicate this on a Linux system, I'd like to play with it.

      $SIG{CHLD} = sub {while (waitpid(-1, WNOHANG) > 0){} };
      This shouldn't hang if there are no children to reap. And it will reap all available children to reap (0,1,20,150).

      Somewhere in the last couple of threads about this, there was a question about grandchildren. There can be "big trouble in Dodge City" if children are creating grandchildren. Maybe the grandchild dies and expects its parent who was a "child" to reap it, but if that child dies, I suspect that there could be some race conditions about who reaps that - maybe the "step-child" is still running?

      The parent should only be responsible for reaping it own children. I highly recommend against the idea of children making further grandchildren. After a fork(), the child should close the passive socket.

      As a general "rule" children should not spawn more grandchildren. I mean a child is supposed to do its job of talking to a client and then die. But things can get "out of wack" if that child has its own child. So the first "step child" is supposed to die and get reaped, but that can't happen if it itself has as a child that is supposed to be active? I think there is a problem here. We have a "dead" parent, that can't be reaped because it has a child that can't be reaped.

      I think things will be ok if you do not allow children to make other children. First thing that a child should do is close the passive socket.

      First thing that a parent (server) should do after the fork() is close the active socket. Parents (servers) should only listen for new client connections. Children should only deal with their currently active socket (to the client).

      Yes, there are models where parents coordinate children activities, but that is an advanced topic and I don't think that we are talking about that here. That is into the range of not only complicated, but darn complicated!

        Some points about this:

        • Problem happens in Debian, OpenSuse with perl5.10.1 and perl5.12.2
        • There are no grandchildren.
        • Does not seem to happen on single CPU, only multiple CPUs.
        • Use ratio of 4 children per CPU and parent only maintains that number
        • Children exit after 4 hours.
        • Usually takes 20-24 hours for problem to happen!

        I'm sure it's a timing issue, but I haven't been able to duplicate the problem in testing. I think it hangs in $SIG{CHLD} since when I set the child $SIG{CHLD} to 'IGNORE', the problem goes away. It could be perl, linux or my program. If I find a way to duplicate it, I'll let you know. Regards.

        Thank you

        "Well done is better than well said." - Benjamin Franklin

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://927257]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2018-02-23 18:48 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (307 votes). Check out past polls.