Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^2: Setting signal handlers considered unsafe?

by gnosek (Sexton)
on Nov 06, 2008 at 08:53 UTC ( [id://721916]=note: print w/replies, xml ) Need Help??


in reply to Re: Setting signal handlers considered unsafe?
in thread Setting signal handlers considered unsafe?

Apparently not using local is enough to prevent the crashes. sigprocmask also solves the issue and I have been confused about the root cause of my problems.

It looks like the signal handler (essentially calling die) leaked out of the eval { } block. Or doesn't it?

This code:

#!/usr/bin/perl use warnings FATAL => qw( all ); use strict; use Test::More tests => 1; use POSIX qw( _exit ); sub wrap_sigs { my $signals = shift; my $coderef = shift; my $died; my %old_sighandlers; eval { for my $sig (keys %$signals) { $old_sighandlers{$sig} = ($SIG{$sig} || 'DEFAU +LT'); $SIG{$sig} = sub { die ('SIG' . $sig . "\n") +}; } $died = 1 unless eval { $coderef->(); 1 }; for my $sig (keys %old_sighandlers) { $SIG{$sig} = $old_sighandlers{$sig}; } }; return $died; } my $pid = fork; if ($pid) { $SIG{ALRM} = sub { }; my $timeout = time + 5; diag('parent entering wrap_sigs loop'); while (time < $timeout) { wrap_sigs( { ALRM => 1 }, sub { } ); } diag('parent survived, killing child'); kill TERM => $pid; } else { sleep 1; diag('child starting signal storm'); 1 while kill ALRM => getppid; _exit(0); } ok(1, 'survived signal storm');
produces results like:
1..1 # parent entering wrap_sigs loop # child starting signal storm SIGALRM SIGALRM END failed--call queue aborted.

Sorry for shifting the goalposts but have you got any ideas?

Replies are listed 'Best First'.
Re^3: Setting signal handlers considered unsafe?
by gone2015 (Deacon) on Nov 06, 2008 at 14:56 UTC

    I tweaked you code a little to get a view of what was happening:

    my $rcvd = 0 ; # added counter for signals seen in eval's handle +r sub wrap_sigs { ... eval { for my $sig (keys %$signals) { $old_sighandlers{$sig} = ($SIG{$sig} || 'DEFAULT'); $SIG{$sig} = sub { $rcvd++ ; die ("SIG $sig $rcvd\n") }; ...
    and the result was:
    1..1
    # parent entering wrap_sigs loop
    # child starting signal storm
    SIG ALRM 5
    # Looks like your test died before it could output anything.
    
    from which I conclude that a number of ALRM signals were trapped in the eval, but eventually Perl left eval state, with the die handler still set.

    Running the thing a number of times I got quite a wide range of numbers of ALRM signals swallowed while in the eval.

      eventually Perl left eval state, with the die handler still set.

      Isn't that a rather serious bug in perl (famous last words)? Looking at the code there is no place where a handler capable of dying is possibly called outside eval.

      Could a signal get caught while perl was inside the eval block, a Perl-level handler stored somewhere and its execution resumed after the current opcode "exit eval block" finished? But at the ending brace of the eval block the handler should have already been reset. So (blissfully ignorant of perlguts) I'd guess that signals may be delivered to Perl more than one opcode after delivery to perl (yay, I used them both in a single sentence). If the two opcodes interact with the signal delivery process, Bad Things (tm) happen.

      Newsflash! Adding a sleep $anything_above_1us (and use Time::HiRes qw( sleep ) of course) between resetting the handlers and the closing brace makes the test pass quite repeatably, at least for me. Sleeping for 1e-6 seconds does not do anything, sleeping for 1.00001e-6 passes the test. Probably has something to do with populating a struct timeval with 1us resolution somewhere.

      The difference between two calls clearly shows that Time::HiRes::sleep rounds its argument to microsecond precision and any non-zero value prevents the bug from appearing.

      $ strace -e nanosleep perl -MTime::HiRes=sleep -le 'sleep 1.e-6' nanosleep({0, 0}, NULL) = 0 Process 12719 detached $ strace -e nanosleep perl -MTime::HiRes=sleep -le 'sleep 1.01e-6' nanosleep({0, 1000}, NULL) = 0 Process 12721 detached

        Isn't that a rather serious bug in perl (famous last words)? Looking at the code there is no place where a handler capable of dying is possibly called outside eval.

        Could a signal get caught while perl was inside the eval block, a Perl-level handler stored somewhere and its execution resumed after the current opcode "exit eval block" finished? But at the ending brace of the eval block the handler should have already been reset.

        The problem is that "at the ending brace of the eval block" the handler is not reset.

        For that to be the case, it would be necessary to localise the $SIG{ALRM}. But as has been discovered, the serious bug is that the localisation temporarily sets the DEFAULT -- which is horrible.

        Anyway, reverting to the empirical approach: to show that the handler set in the eval block remains set after the handler dies and terminates the eval, I modified your test as shown below. The results I got are:

        +++ parent entering wrap_sigs loop
        $SIG{ALRM}=outr_alrm enter wrap_alrm
        $SIG{ALRM}=outr_alrm exit  wrap_alrm (0) $outr=0; $rcvd=0; $dies=0; $wrap=0
        $SIG{ALRM}=outr_alrm after sleep         $outr=0; $rcvd=0; $dies=0; $wrap=0
        $SIG{ALRM}=outr_alrm enter wrap_alrm
        $SIG{ALRM}=outr_alrm exit  wrap_alrm (0) $outr=0; $rcvd=0; $dies=0; $wrap=0
        $SIG{ALRM}=outr_alrm after sleep         $outr=0; $rcvd=0; $dies=0; $wrap=0
        $SIG{ALRM}=outr_alrm enter wrap_alrm
        $SIG{ALRM}=outr_alrm exit  wrap_alrm (0) $outr=0; $rcvd=0; $dies=0; $wrap=0
        +++ child starting signal storm
        $SIG{ALRM}=outr_alrm after sleep         $outr=1; $rcvd=0; $dies=0; $wrap=0
        $SIG{ALRM}=outr_alrm enter wrap_alrm
        $SIG{ALRM}=wrap_alrm exit  wrap_alrm (1) $outr=1; $rcvd=1; $dies=1; $wrap=0
        $SIG{ALRM}=wrap_alrm after sleep         $outr=1; $rcvd=2; $dies=1; $wrap=1
        $SIG{ALRM}=wrap_alrm enter wrap_alrm
        $SIG{ALRM}=wrap_alrm exit  wrap_alrm (1) $outr=1; $rcvd=3; $dies=2; $wrap=1
        $SIG{ALRM}=wrap_alrm after sleep         $outr=1; $rcvd=4; $dies=2; $wrap=2
        $SIG{ALRM}=wrap_alrm enter wrap_alrm
        $SIG{ALRM}=wrap_alrm exit  wrap_alrm (1) $outr=1; $rcvd=5; $dies=3; $wrap=2
        $SIG{ALRM}=wrap_alrm after sleep         $outr=1; $rcvd=6; $dies=3; $wrap=3
        $SIG{ALRM}=wrap_alrm enter wrap_alrm
        $SIG{ALRM}=wrap_alrm exit  wrap_alrm (1) $outr=1; $rcvd=7; $dies=4; $wrap=3
        $SIG{ALRM}=wrap_alrm after sleep         $outr=1; $rcvd=8; $dies=4; $wrap=4
        
        this shows:
        • to start with the ALRM is set to the "outr_alrm" subroutine, which simply counts up the $outr counter.
        • before the child process starts sending ALRMs the eval in the "wrap_alrm" subroutine sets it's own ALRM signal handler, sleeps for a bit, and the restores the "outr_alrm" handler. The output above shows that before and after the call to "wrap_alrm", $SIG{ALRM} is set to the "outr_alrm".
        • when the child process starts sendin ALRMs it first catches the parent sleeping outside the "wrap_alrm" subroutine -- so we see the $outr signal counter steps by one.
        • then the parent process loops round and enters "wrap alrm", and in the eval sets its own handler. This promptly collects a signal, which counts up $rcvd and since it is in the eval, counts up $dies and dies. This terminates the eval, which skips the restoration of the "outr_alrm" handler.
        • so, on exit from "wrap_alrm" when its handler has died, we can see that its handler is still in place.
        • so, the sleep that follows is interrupted by a further ALRM, which while outside the eval counts up $wrap (and does not die).
        • and the process continues, with the "wrap_alrm" handler in place...
        I hope this is reasonably clear !

Re^3: Setting signal handlers considered unsafe?
by ig (Vicar) on Nov 11, 2008 at 19:11 UTC
    have you got any ideas?

    The problem is that the signal handler is not being reset when the eval is terminated by the die in the signal handler, so the solution is to reset the signal handler in the signal handler.

    I modified your script to do this and it survived the signal storm 10 times out of 10 on my system.

    I also added a few print statements to see how often and when various bits executed. I was surprised how few times a signal was caught: quite reliably between 10 and 15 times, despite how quickly the two loops execute. I guess this has to do with how often the process scheduler switches running processes on my single CPU system.

      The problem is that the signal handler is not being reset when the eval is terminated by the die in the signal handler, so the solution is to reset the signal handler in the signal handler.

      Like most brilliant ideas, it seems obvious afterwards. Thanks a lot!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://721916]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-03-28 16:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found