My program (running on Solaris and Linux only, no Windows compatibility needed) is designed to start child processes and "monitor" them in the following way:
- If all the child processes exited, the parent also should terminate
- If the parent process detects a certain condition, it should kill all childs and terminate
My first approach for the parent (which was flawed) went like this:
- Creating the childs with the usual fork/exec mechanism and storing the child PIDs in a list.
- In a loop, testing the aforementioned condition, and killing the childs if the condition is met
- In the same loop, testing whether the childs are still alive (by sending them the pseudo-signal 0) and terminating the loop if no child is running anymore.
It was the last item, which did not work: The parent did not notice when I child exited. Inspecting the processes, I found that the exited child was marked as defunct
, but still present in the process table. I guess that this is the reason why kill(0,...)
still pretended that it could send the signal. Am I right so far?
I then thought that maybe the child could not deliver its SIGCHLD on exiting, so I added the following line to my program, prior to creating the first child process:
Indeed, my program now terminates immediately, when all its childs exit. Now I wonder: Why do I have to set this explicitly? What is the "default" interrupt handler for SIGCHLD?
And finally, I would like to ask you whether my approach to handling the child processes, is reasonable, or whether it maybe has other pitfalls, which I just don't see yet.
Ronald Fischer <email@example.com>