Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Monitoring child processes

by sundialsvc4 (Abbot)
on Mar 13, 2012 at 21:08 UTC ( #959451=note: print w/replies, xml ) Need Help??


in reply to Monitoring child processes

The root problem, which is extremely difficult to deal with, is basically a race-condition:   the parent might “determine” the status of a child, but, before it can react to the status that it has thusly determined, the status of the child has changed.   Strictly speaking, you don’t even know that your list-of-children is instantaneously correct.

Obviously, the most desirable thing to do would be to schlep the entire responsibility off to an existing known-good CPAN module, such as, say, Parallel::ForkManager.   Can you find a way to do that?

Otherwise, I suggest that you should devise that the only role of the parent process/thread should be “to run the nursery.”   All of the other responsibilities, including checking whether a particular condition has occurred, ought to be the responsibilities of children.   If that special child informs you that, indeed, “ka-ka has occurred,” the parent should respond by issuing a signal to every one of its children that asks them to “please die, as soon as you possibly can,” then waits for them all to do so.   It does not poll them to see if they are alive:   it does not have to.   If at all possible, it also does not kill them.   (How messy ... and, how unpredictable.   Each child, once alive, is responsible for setting its own affairs in order upon the occasion of its own death... timely or otherwise.)

I have consistently found (and, maybe it’s just me ...) that if you try to give the parent process many responsibilities of its own to take care of in addition to “watching the kids,” the kids get into trouble in ways that you could not possibly have anticipated and could never reproduce.   (This being a case in which computers imitate real life!!)

Replies are listed 'Best First'.
Re^2: Monitoring child processes
by Marshall (Canon) on Mar 14, 2012 at 00:14 UTC
    The root problem, which is extremely difficult to deal with, is basically a race-condition: the parent might “determine” the status of a child, but, before it can react to the status that it has thusly determined, the status of the child has changed. Strictly speaking, you don’t even know that your list-of-children is instantaneously correct.

    This is not correct. There is no "race-condition" in properly implemented code. The OS handles some things "atomically" (I don't mean automatically - that is different - "atomic" means in a single operation) that you cannot do for yourself. The SIGCHLD like other signals is a level sensitive thing (not edge triggered), meaning that when multiple children exit close to one another, you only get one SIGCHLD signal.

    When the SIGCHLD is "delivered" (the handler starts running) the OS atomically blocks that signal. This is different than you setting the sigprocmask yourself in the handler. Basically while you are messing around in your handler, this allows the possibility of an additional SIGCHLD to arrive and be in a "pending" but "undelivered" state.

    The classic SIGCHLD handler processes all of the children via the waitpid() function (and there may very well be multiple children to process). If say 5 children exit while you are messing around in the handler. This fact is noted by the OS and this becomes yet another SIGCHLD (a single level triggered signal) in the "pending but undelivered" state.

    When you exit the handler, this "pending" SIGCHLD is unblocked and you immediately get another SIGCHLD signal. Basically this ensures that you will not "miss one" - that is the important part that eliminates the "race condition". The OS has to do this and it does.

    I think that it is possible under certain circumstances for you to get a SIGCHLD where there is "nothing to do" because its already been handled (while you were just in the signal handler).

    Basically, the "race condition" is handled by the OS and there is not a possibility of "missing a SIGCHLD event" as long as you process all available children while you are in the SIGCHLD handler.

    use the waitpid() function to reap children. Let the OS do the job of deciding who is ready to "reap" or not. There is no need for the parent to maintain its own "children" list, if that is what you meant.

      Thank you very much (and also many thanks to the others who contributed to this thread) for the elaborate responses. It really helped me a lot!

      -- 
      Ronald Fischer <ynnor@mm.st>

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://959451]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2020-12-04 08:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How often do you use taint mode?





    Results (58 votes). Check out past polls.

    Notices?