Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Complex and reliable signal handling.

by vsespb (Chaplain)
on May 31, 2013 at 15:57 UTC ( #1036260=perlquestion: print w/replies, xml ) Need Help??
vsespb has asked for the wisdom of the Perl Monks concerning the following question:

I have fork()'ing application with one parent process and multiple children processes. It works fine, but I am having problems implementing graceful termination after SIGINT/SIGHUP/SIGTERM.

Requirements are:

1. I use File::Temp for temporary files (I need named temporary files). Children processes should handle SIGINT/SIGHUP, because otherwise destructors are not called and temporary files left on disk.

2. If child dies with exception, parent should handle SIGCHLD, terminate all other children and terminate itself.

3. If parent dies (either because of exception, or maybe because of SIGTERM) it should terminate all children.


After I added signals handling to children code, I am getting errors (often or rare)

panic: fold_constants JMPENV_PUSH returned 2 at /usr/share/perl5/File/ line 1015. panic: fold_constants JMPENV_PUSH returned 2 at /usr/share/perl5/File/ line 1015. panic: fold_constants JMPENV_PUSH returned 2 at /usr/share/perl5/File/ line 1015.
panic: fold_constants JMPENV_PUSH returned 2 at /usr/local/share/perl/ +5.10.1/Net/HTTP/ line 596.
panic: fold_constants JMPENV_PUSH returned 2 at /usr/local/share/perl/ +5.10.1/Net/HTTP/ line 596. panic: fold_constants JMPENV_PUSH returned 2 at /usr/local/share/perl/ +5.10.1/ line 132.
or if I try to modify this code, it works a bit better, but I still see errors sometimes. Also sometimes Segfault happening.

Children code:

my $first_time = 1; my @signals = qw/INT TERM USR2 HUP/; for my $sig (@signals) { $SIG{$sig} = sub { if ($first_time) { $first_time = 0; exit(1); # we need exit, it will call all destructors whic +h will destroy tempfiles } }; } dump_error() unless (defined eval {do_work(); 1; }); $SIG{$_} = 'IGNORE' for (@signals); # THIS CAN BE REMOVED - RESULT IS +SAME kill(POSIX::SIGUSR1, $parent_pid); # THIS CAN BE REMOVED - RESULT IS S +AME exit(1);

Parent code:

my $first_time = 1; for my $sig (qw/INT TERM CHLD USR1 HUP/) { $SIG{$sig} = sub { local ($!,$^E,$@); if ($first_time) { $first_time = 0; kill (POSIX::SIGUSR2, keys %{$self->{children}}); print STDERR "EXIT on SIG$sig\n"; exit(1); } }; } do_parent_work();

My perl version is 5.10 on linux, I understand that it's hard to test particular signal handling code on all perl versions, but I am looking for a scheme, that is reliable by design and unlikely produce bug.

Or maybe you can find bug in my above code?

p.s. I've read about sigaction() but still don't see how it can help and why

Replies are listed 'Best First'.
Re: Complex and reliable signal handling.
by shmem (Canon) on May 31, 2013 at 18:40 UTC
    perldiag gives a hint:
    panic: fold_constants JMPENV_PUSH returned %d
    (P) While attempting folding constants an exception other than an "eval" failure was caught.

    I guess that your entire program is a bit more complex than the bits you posted.
    Perhaps a $SIG{__DIE__} handler would reveal more:

    use Carp(); $SIG{__DIE__} = \&Carp::confess;
      I guess that your entire program is a bit more complex than the bits you posted.
      yes, treat it as pseudocode. I can't create proof-of-concept code.
      Perhaps a $SIG{__DIE__} handler would reveal more:
      Yes, yo're right. fold_constants mostly happens, when child terminated with exception. However. All 'die's are inside eval (you can see eval in pseudocode). After eval caught, program executes dump_error() and exits (this way I see all errors) Also I did not see the panic, until I installed signal handlers into "children" code above.
      I tried $SIG{__DIE__} = \&Carp::confess; It prints nothing. Perhaps because it prints errors only outside eval?

      I don't have own DIE handlers (but LWP, which I use, do, however it should localize it).

      Also, I remember now, most (but not all) lines code lines mentioned in panic message contained eval (or 'require' which calls eval, or even eval{require}).
Re: Complex and reliable signal handling.
by Anonymous Monk on May 31, 2013 at 20:23 UTC

    When I see words like JUMPENV_PUSH, my instincts tell me that what’s really happening here is that your various threads are conflicting with one another in relation to the surrounding environment.   The Perl packages in question won’t be prepared to consider if, for example, two or more instances of itself are engaged in a race-condition, both of them attempting to do the same thing at the same time.

    You may need to add some kind of a mutual-exclusion mechanism surrounding some of your calls, e.g. to create a new temporary file, in order to be sure that multiple threads do not attempt to do these things at precisely the same instant.   Both the unpredictability of these happenings, and the severity of them (segfault ...) seem to support this hypothesis.   In other contexts, we would say that these calls “are not thread-safe,” and it is most certainly the case that they were never designed to be.

      ... above is my post.   PM logged me out.

      Also note that your termination code would need to consider, in a similar fashion, how to terminate gracefully:   if a thread is in the middle of creating a temporary file, it shouldn’t be shot-dead at that exact moment.

      Edit:   Unfortunately, my original post appears to be wrong, see below, and I can’t strike it out because I don’t own it.

        Read again. The OP didn't mention threads.

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Yep, I use fork().

        I did “read it carefully,” thank you all very much.   I intended to use the term, “thread-safe,” loosely in this particular case, and in fact I did it inappropriately.

        The race that I hypothesize, does not concern resources being shared by threads in a single process-context, but rather, the resources that are being shared by all of the processes in the login-shell environment.   A statement as simple as $ENV{'FOOBAR'} = $ENV{'FOOBAR'} + 1; would exhibit this sort of race-condition conflict.

        It could also be that you should not exit() within the signal handler, but instead should set a flag which causes the main loop of the child process to end as-soon as-possible.   When this is done, then, no matter what the process was doing at the unpredictable instant in which the termination-signal arrived, you know that it will end at a predictable point and in a predictable state ... real soon now, but not at this very instant.

        Looking, now, more closely than I did before at the relevant perlguts, I see that JUMPENV_PUSH has to do with longjmp() state-saving.   Maybe there’s a hole there somewhere, and if so your task is to avoid it not to fix it.   By delaying the actual terminate to “real soon now” instead of “right now,” you’d avoid such a hole.

        This is specifically what I would do in the child:

        my $interrupted = 0; my @signals = qw/INT TERM USR2 HUP/; for my $sig (@signals) { $SIG{$sig} = sub { $interrupted = 1; } } #... then, throughout the code test for it ... #main loop: while (not $interrupted) { ... ... blah blah ... last if $interrupted; ... or ... exit(1) if $interrupted; }

        The only immediate “response to” the signal is to set a flag which indicates that a signal has been received.   The child’s processing-loop frequently tests this flag at strategic places, and busts-out of the processing loop gracefully.   The arrival of a signal should also knock the process out of various kinds of voluntary sleep, so that it will always be responded-to.   But it no longer matters precisely what the process was doing at the precise instant that the telephone rings.

        The parent-process should also explicitly wait-for children to terminate, before finally exiting itself, and before tearing-down any data structures that the children might depend on.   The second most-common place where applications sporadically fail, is when they are ending, because the parent jerks the rug out from under the children “sometimes.”

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1036260]
Front-paged by BrowserUk
[LanX]: qwiud you sthink so?
[LanX]: zxwqbd good idea! :)
LanX embraces his new habit spqopiwjdnq
[ambrus]: qQUkZTmHTuKxStGT- BzTIK9gdudif7TkTLI t3mnF144UaAZjkknXY 8nN-QM19wHBsTrp5vB lEYU_Kksa7X1RIBB4x EWLD5X7SW3jGX5ryfN OMn_yL5FTdQxzjhtyX mKN9sjUCzBNHK5Rrp0 S2WMUvIb1i9aZFgjtq VR0GH1bjPMvm1G16iz hBqc1U6toPd4FbJOFj VsOeT745AN1_pO88rD SRAYKtBZwCZedESZmN mvutrOTHiSNwflB- pRfn_k
[Eily]: so far it seems to work
Your Mother reminds the monks they should be grateful not to share an office, lest they be subjugated to constant inanities like, "Czech please!"
[LanX]: what's strange is that the "Cowboy you said this already" message is missing #dqiwd
[LanX]: YM: BTW learn to mute your humanity
[Your Mother]: Cumin? Now I want tacos...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (14)
As of 2017-03-27 16:55 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (320 votes). Check out past polls.