Beefy Boxes and Bandwidth Generously Provided by pair Networks Bob
Perl: the Markov chain saw
 
PerlMonks  

Re^6: Best way to kill a child process

by flexvault (Vicar)
on Oct 10, 2011 at 15:16 UTC ( #930664=note: print w/ replies, xml ) Need Help??


in reply to Re^5: Best way to kill a child process
in thread Best way to kill a child process

Hi Marshall

. . . a coderef to a subroutine that causes a waitpid loop. . .

Just something to think about! I recently (past year) removed all uses of waitpid loops, since on some recent versions of AIX/Unix/Linux, especially on multi-core computers, if the child had been reaped by another core, the SIG handler hanged forever. I replace the code in the parent with:

if ( kill 0 => $child ) { $children++; ... } else { my $ret = &make_child( ); ... }

It works, but like you I prefer the sub. I don't know if this behavior is a bug, or it its intentional. The problem doesn't seem to happen it the child exists, only when it has been reaped in a previous call.

I commented out the previous use, may need it again :-)

Thank you

"Well done is better than well said." - Benjamin Franklin


Comment on Re^6: Best way to kill a child process
Download Code
Re^7: Best way to kill a child process
by Marshall (Prior) on Oct 15, 2011 at 21:18 UTC
    This is odd and I don't know how to replicate this. If you can replicate this on a Linux system, I'd like to play with it.

    $SIG{CHLD} = sub {while (waitpid(-1, WNOHANG) > 0){} };
    This shouldn't hang if there are no children to reap. And it will reap all available children to reap (0,1,20,150).

    Somewhere in the last couple of threads about this, there was a question about grandchildren. There can be "big trouble in Dodge City" if children are creating grandchildren. Maybe the grandchild dies and expects its parent who was a "child" to reap it, but if that child dies, I suspect that there could be some race conditions about who reaps that - maybe the "step-child" is still running?

    The parent should only be responsible for reaping it own children. I highly recommend against the idea of children making further grandchildren. After a fork(), the child should close the passive socket.

    As a general "rule" children should not spawn more grandchildren. I mean a child is supposed to do its job of talking to a client and then die. But things can get "out of wack" if that child has its own child. So the first "step child" is supposed to die and get reaped, but that can't happen if it itself has as a child that is supposed to be active? I think there is a problem here. We have a "dead" parent, that can't be reaped because it has a child that can't be reaped.

    I think things will be ok if you do not allow children to make other children. First thing that a child should do is close the passive socket.

    First thing that a parent (server) should do after the fork() is close the active socket. Parents (servers) should only listen for new client connections. Children should only deal with their currently active socket (to the client).

    Yes, there are models where parents coordinate children activities, but that is an advanced topic and I don't think that we are talking about that here. That is into the range of not only complicated, but darn complicated!

      Some points about this:

      • Problem happens in Debian, OpenSuse with perl5.10.1 and perl5.12.2
      • There are no grandchildren.
      • Does not seem to happen on single CPU, only multiple CPUs.
      • Use ratio of 4 children per CPU and parent only maintains that number
      • Children exit after 4 hours.
      • Usually takes 20-24 hours for problem to happen!

      I'm sure it's a timing issue, but I haven't been able to duplicate the problem in testing. I think it hangs in $SIG{CHLD} since when I set the child $SIG{CHLD} to 'IGNORE', the problem goes away. It could be perl, linux or my program. If I find a way to duplicate it, I'll let you know. Regards.

      Thank you

      "Well done is better than well said." - Benjamin Franklin

        Yes, if you find a way to duplicate this with a fairly simple program, I'd like to run it and see if I can replicate it also.

        I don't have a Linux machine myself, but I do have access to a 64 bit Linux, active-state 64 bit installation on a remote machine. I can hammer on this machine during the evenings and on the weekends.

        Setting $SIG{CHLD} = 'IGNORE' should cause the low level sigaction() structure to be set. Once that happens, Perl has nothing at all to do with this signal as Perl would never even see the signal. So if this causes a "reap" of the child, I can see why this doesn't cause a problem.

        Anyway, if you can make this happen more often than once every 24 hours, then I have a way to make a couple of runs on the weekend when the college's server is at a low usage level.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://930664]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (11)
As of 2014-04-16 06:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (415 votes), past polls