Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

linux perl - interrupted system calls not restarted?

by flipper (Beadle)
on Oct 22, 2009 at 12:44 UTC ( #802731=perlquestion: print w/replies, xml ) Need Help??

flipper has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I've spent some time tracking down a strange problem in a daemon script - ultimately I've no-one to blame but myself, as I wasn't checking my return codes and/or $!, but I've found some surprising behaviour... Much reduced code:

#!/usr/bin/perl -w use strict; use IO::Socket::INET; my $server =IO::Socket::INET->new(LocalPort => 4848, Listen=> 1,ReuseA +ddr => 1) or die "listen: $!"; my $client; USER: while($client= $server->accept()){ if (my $pid = fork()){ #parent close $client; $SIG{CHLD} = sub {}; }else{ print $client "hello, world!\n"; select(undef,undef,undef,2); exit; } } warn "fell out of loop - $!";

The sleep in the child process causes SIGCHLD to be delivered to the parent when it is in blocking in accept(), and sure enough I get fell out of loop - Interrupted system call at /tmp/ line 17.

I understand from chapter 16 of the Camel Book that this could happen with old, horrible Unices, but I'm surprised it happens with This is perl, v5.10.0 built for i486-linux-gnu-thread-multi on Debian 5.0.3

Can anyone shed some light on this?? An strace looks like it's going to work, but doesn't...

bind(3, {sa_family=AF_INET, sin_port=htons(4848), sin_addr=inet_addr(" +")}, 16) = 0 listen(3, 1) = 0 accept(3, {sa_family=AF_INET, sin_port=htons(23694), sin_addr=inet_add +r("")}, [16]) = 4 ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbffa26c8) = -1 EINVAL (Inval +id argument) _llseek(4, 0, 0xbffa2710, SEEK_CUR) = -1 ESPIPE (Illegal seek) ioctl(4, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbffa26c8) = -1 EINVAL (Inval +id argument) _llseek(4, 0, 0xbffa2710, SEEK_CUR) = -1 ESPIPE (Illegal seek) fcntl64(4, F_SETFD, FD_CLOEXEC) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIG +CHLD, child_tidptr=0xb7d9f908) = 5206 close(4) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGCHLD, {0x809a270, [], 0}, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 accept(3, 0xbffa28c8, [4096]) = ? ERESTARTSYS (To be restart +ed) --- SIGCHLD (Child exited) @ 0 (0) --- sigreturn() = ? (mask now []) rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0 rt_sigaction(SIGCHLD, NULL, {0x809a270, [], 0}, 8) = 0 rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0 open("/usr/share/locale/locale.alias", O_RDONLY) = 4 fstat64(4, {st_mode=S_IFREG|0644, st_size=2586, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, + 0) = 0xb7f7a000 read(4, "# Locale name alias data base.\n# "..., 4096) = 2586 read(4, ""..., 4096) = 0 close(4) = 0 munmap(0xb7f7a000, 4096) = 0 open("/usr/share/locale/en_GB.UTF-8/LC_MESSAGES/", O_RDONLY) = +-1 ENOENT (No such file or directory) open("/usr/share/locale/en_GB.utf8/LC_MESSAGES/", O_RDONLY) = - +1 ENOENT (No such file or directory) open("/usr/share/locale/en_GB/LC_MESSAGES/", O_RDONLY) = 4 fstat64(4, {st_mode=S_IFREG|0644, st_size=1474, ...}) = 0 mmap2(NULL, 1474, PROT_READ, MAP_PRIVATE, 4, 0) = 0xb7f7a000 close(4) = 0 open("/usr/share/locale/en.UTF-8/LC_MESSAGES/", O_RDONLY) = -1 +ENOENT (No such file or directory) open("/usr/share/locale/en.utf8/LC_MESSAGES/", O_RDONLY) = -1 E +NOENT (No such file or directory) open("/usr/share/locale/en/LC_MESSAGES/", O_RDONLY) = -1 ENOENT + (No such file or directory) write(2, "fell out of loop - Interrupted sy"..., 65) = 65 close(3) = 0 exit_group(0) = ?


Replies are listed 'Best First'.
Re: linux perl - interrupted system calls not restarted?
by jakobi (Pilgrim) on Oct 22, 2009 at 13:29 UTC
    consider $SIG{CHLD} = 'IGNORE', which indeed avoids both ending the loop and zombies. Search for reaper and IGNORE.

    As you said you need IO::Socket below:

    Possibly IO::Socket gets irritated by some %SIG handlers (couldn't find anything on the quick in the source or docs though). 2 possible workarounds I currently see:

    • perlipc: search for sigaction and SA_RESTART then read the section on EINTR directly below
    • IGNORE as handler and place the 'rc' into a file in the child, then in the mainloop regularly waitpid over the children and check for possible rc files from children died with rc!=0. Missing 'rc' and missing child is indicating a more severe error.

    Seems a known issue: SOLVED: Re: TCP Client-Server: Server exits though it shouldn't loops forever, and retries accept in case of EINTR. But the loop in the reaper looks like it can stop early; and other syscalls may mess up the detection of EINTR, which seems quite far away from the interrupted syscall.

    Maybe just retry accept() upto n times in a row if $client is false.

    Note that's there's a small race of the parent running w/o SIG handler, and later children possibly running with it (if it's inheritable by fork; CHECKED: doesn't seem to be inherited)

    Please also post your updated code when done, thanx, Peter

      Indeed - perldoc perlipc suggests that the info in my camel book is out of date, guess I should update!

      Thanks for your help!
      I need to use SIG{CHLD} in this case - the parent accepts connections, if there is no child running, it starts a child to service the new connection. If there is an existing child, it tells the new client it can't connect as it is in use from ip:port. The child can exit nonzero - when this happens, the parent needs to exit immediately (not at the next accept().

      So the parent needs to detect new connections, and a child exiting. The issue I'm concerned with is more general - If I'm using signals anywhere (eg sleep()), do I need to wrap every system call in a loop to retry it??

        I need to use SIG{CHLD} in this case

        Then you need to check for EINTR

        use Errno qw( EINTR ); for (;;) { my $client = $server->accept(); if (!$client) { next if $! == EINTR; die("Can't accept: $!\n"); } ... }

        If I'm using signals anywhere (eg sleep()), do I need to wrap every system call in a loop to retry it??

        Well, those that are interruptable, yes. That includes sysread. But sysread should already be in a loop since it's not guaranteed to return as many bytes as you requested.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://802731]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2020-02-23 04:10 GMT
Find Nodes?
    Voting Booth?
    What numbers are you going to focus on primarily in 2020?

    Results (102 votes). Check out past polls.