Re^2: what causes a segmentation violation

I suppose I should have been more specific when I asked about 'what causes a segmentation violation', to avoid getting the obvious description of what it is (of which I am sadly, well aware).

I took the suggestion of looking at what the core dump could provide. A gdb stack trace told me that it was dying during signal handling. (Unfortunately I didn't write down the exact routine name, but... So without attempting to understand and diagnose Perl itself, I looked at where my code dealt with signals. I narrowed it down to 'death of a child' in my forking TCP server code.

My server is bsed on the standard skeleton from 'the cookbook'. What happens in my code is that on occasion, the main listener may choose to shut itself down and all children (for example, say when it catches a CTRL-C) The mainline would go through and kill off all forked children, and then die itself.

What was happening was (or at least _my impression_ was) that the children would be killed off, but not dead yet. Then the mainline would die, but the perl interpreter would still be around. Then the interpreter for the main-line would receive the 'death of a child' signal for (one or more of) the children, but could no longer handle it, because the mainline was (just about...!) dead. So it would issue a segfault and core dump.

My solution/workaround under this situation was to signal the children to die, and then have the mainline actually wait around (perhaps forever) for the children to die, and then (and only then) exit/die itself. I.e.

sub reaper { 1 until (-1 == waitpid(-1, WNOHANG)); }

$SIG{CHLD} = \&reaper;
kill 9, $childPid;  # signal the child to die
reaper();           # IMPORTANT! ...wait for the child to go away
                    # (else we might get perl seg faulting on exit)
exit;               # and die ourselves
[download]

So in the end, I was seeing this on a number of my apps that have used the same philosophy on shutdown of forking server apps, and the reason it was intermittent failure/warnings was all due to the random timing of the parent/child dying/exiting relationship.

...Sometimes the signal catcher would get invoked... but sometimes the interpreter seemed to have been shutdown far enough that the catcher was no longer there when the signal arrived, so it would core dump on shutdown.

Comment on Re^2: what causes a segmentation violation Download Code


P is for Practical
	PerlMonks