Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Setting signal handlers considered unsafe?

by ig (Vicar)
on Nov 05, 2008 at 23:08 UTC ( [id://721856]=note: print w/replies, xml ) Need Help??


in reply to Setting signal handlers considered unsafe?

You might try the following on your system and let us know what happens...

#!/usr/bin/perl use warnings FATAL => qw( all ); use strict; $|=1; sub mainhandler { print "SIGALRM: main handler\n"; }; $SIG{ALRM} = \&mainhandler; sub f { $SIG{ALRM} = sub { print "SIGALRM: sub handler\n" }; # line 9 $SIG{ALRM} = \&mainhandler; } if (fork) { # parent f while 1; } else { # child sleep 7; print "child starting\n"; 1 while kill ALRM => getppid; }

Update: I have now run this script for well over an hour without any faults. There may still be race conditions as the signal handlers are changed, but the windows of vulnerability are much smaller than with the original script. I have not seen the Unable to create sub named "" error at any time.

Replies are listed 'Best First'.
Re^2: Setting signal handlers considered unsafe?
by gnosek (Sexton) on Nov 06, 2008 at 08:53 UTC

    Apparently not using local is enough to prevent the crashes. sigprocmask also solves the issue and I have been confused about the root cause of my problems.

    It looks like the signal handler (essentially calling die) leaked out of the eval { } block. Or doesn't it?

    This code:

    produces results like:
    1..1 # parent entering wrap_sigs loop # child starting signal storm SIGALRM SIGALRM END failed--call queue aborted.

    Sorry for shifting the goalposts but have you got any ideas?

      I tweaked you code a little to get a view of what was happening:

      my $rcvd = 0 ; # added counter for signals seen in eval's handle +r sub wrap_sigs { ... eval { for my $sig (keys %$signals) { $old_sighandlers{$sig} = ($SIG{$sig} || 'DEFAULT'); $SIG{$sig} = sub { $rcvd++ ; die ("SIG $sig $rcvd\n") }; ...
      and the result was:
      1..1
      # parent entering wrap_sigs loop
      # child starting signal storm
      SIG ALRM 5
      # Looks like your test died before it could output anything.
      
      from which I conclude that a number of ALRM signals were trapped in the eval, but eventually Perl left eval state, with the die handler still set.

      Running the thing a number of times I got quite a wide range of numbers of ALRM signals swallowed while in the eval.

        eventually Perl left eval state, with the die handler still set.

        Isn't that a rather serious bug in perl (famous last words)? Looking at the code there is no place where a handler capable of dying is possibly called outside eval.

        Could a signal get caught while perl was inside the eval block, a Perl-level handler stored somewhere and its execution resumed after the current opcode "exit eval block" finished? But at the ending brace of the eval block the handler should have already been reset. So (blissfully ignorant of perlguts) I'd guess that signals may be delivered to Perl more than one opcode after delivery to perl (yay, I used them both in a single sentence). If the two opcodes interact with the signal delivery process, Bad Things (tm) happen.

        Newsflash! Adding a sleep $anything_above_1us (and use Time::HiRes qw( sleep ) of course) between resetting the handlers and the closing brace makes the test pass quite repeatably, at least for me. Sleeping for 1e-6 seconds does not do anything, sleeping for 1.00001e-6 passes the test. Probably has something to do with populating a struct timeval with 1us resolution somewhere.

        The difference between two calls clearly shows that Time::HiRes::sleep rounds its argument to microsecond precision and any non-zero value prevents the bug from appearing.

        $ strace -e nanosleep perl -MTime::HiRes=sleep -le 'sleep 1.e-6' nanosleep({0, 0}, NULL) = 0 Process 12719 detached $ strace -e nanosleep perl -MTime::HiRes=sleep -le 'sleep 1.01e-6' nanosleep({0, 1000}, NULL) = 0 Process 12721 detached
      have you got any ideas?

      The problem is that the signal handler is not being reset when the eval is terminated by the die in the signal handler, so the solution is to reset the signal handler in the signal handler.

      I modified your script to do this and it survived the signal storm 10 times out of 10 on my system.

      I also added a few print statements to see how often and when various bits executed. I was surprised how few times a signal was caught: quite reliably between 10 and 15 times, despite how quickly the two loops execute. I guess this has to do with how often the process scheduler switches running processes on my single CPU system.

        The problem is that the signal handler is not being reset when the eval is terminated by the die in the signal handler, so the solution is to reset the signal handler in the signal handler.

        Like most brilliant ideas, it seems obvious afterwards. Thanks a lot!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://721856]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (8)
As of 2024-04-23 14:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found