Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Unix-Domain TCP Server Crashing

by wokka (Acolyte)
on Aug 16, 2010 at 19:14 UTC ( #855342=perlquestion: print w/ replies, xml ) Need Help??
wokka has asked for the wisdom of the Perl Monks concerning the following question:

I was trying out the Unix-Domain sockets example in the perlipc documentation, and I'm running into some behavior I don't quite understand.

When I run the server and fire off one client request at a time, it behaves just fine, but if I do something like:
~/tmp$ ./sockclient & ./sockclient & ./sockclient
In an attempt to fire off multiple requests at once from a bash prompt (where '~/tmp$' is the prompt), the server will die with no errors. Once or twice it segfaulted, but most of the time one or two repetitions of the above line will cause the server to simply die.

This behavior is consistent and reproducible on my Ubuntu desktop, a Slackware VM, and a FreeBSD server (all running bash).

Given the following server code (taken directly from the documentation):

#!/usr/bin/perl -Tw use warnings; use strict; use Socket; use Carp; BEGIN { $ENV{PATH} = '/usr/ucb:/bin' } sub spawn; # forward declaration sub logmsg { print "$0 $$: @_ at ", scalar localtime, "\n" } my $NAME = 'catsock'; my $uaddr = sockaddr_un($NAME); my $proto = getprotobyname('tcp'); socket(Server,PF_UNIX,SOCK_STREAM,0) || die "socket: $!"; unlink($NAME); bind (Server, $uaddr) || die "bind: $!"; listen(Server,SOMAXCONN) || die "listen: $!"; logmsg "server started on $NAME"; my $waitedpid; use POSIX ":sys_wait_h"; sub REAPER { my $child; while (($waitedpid = waitpid(-1,WNOHANG)) > 0) { logmsg "reaped $waitedpid" . ($? ? " with exit $?" : ''); } $SIG{CHLD} = \&REAPER; # loathe SysV } $SIG{CHLD} = \&REAPER; for ( $waitedpid = 0; accept(Client,Server) || $waitedpid; $waitedpid = 0, close Client) { next if $waitedpid; logmsg "connection on $NAME"; spawn sub { print "Hello there, it's now ", scalar localtime, "\n"; exec '/usr/games/fortune' or die "can't exec fortune: $!"; }; } sub spawn { my $coderef = shift; unless (@_ == 0 && $coderef && ref($coderef) eq 'CODE') { confess "usage: spawn CODEREF"; } my $pid; if (!defined($pid = fork)) { logmsg "cannot fork: $!"; return; } elsif ($pid) { logmsg "begat $pid"; return; # I'm the parent } # else I'm the child -- go spawn open(STDIN, "<&Client") || die "can't dup client to stdin"; open(STDOUT, ">&Client") || die "can't dup client to stdout"; ## open(STDERR, ">&STDOUT") || die "can't dup stdout to stderr"; exit &$coderef(); }

And the following client code (also taken directly from the documentation):
#!/usr/bin/perl use 5.10.0; use strict; use warnings; use Socket; my ($rendezvous, $line); $rendezvous = shift || 'catsock'; socket(SOCK, PF_UNIX, SOCK_STREAM, 0) || die "socket: $!"; connect(SOCK, sockaddr_un($rendezvous)) || die "connect: $!"; while (defined($line = <SOCK>)) { print $line; } exit;
The client terminal looks like this:
~/tmp$ ./sockclient & ./sockclient &./sockclient
1 27089
2 27090
Hello there, it's now Mon Aug 16 15:09:34 2010
Hello there, it's now Mon Aug 16 15:09:34 2010
Hello there, it's now Mon Aug 16 15:09:34 2010
You will be married within a year, and divorced within two.
~/tmp$ You will remember something that you should not have forgotten.
Just to have it is enough.
And the server terminal looks like this:
~/tmp$ ./sockserv
./sockserv 22710: server started on catsock at Mon Aug 16 15:02:33 2010
./sockserv 22710: connection on catsock at Mon Aug 16 15:09:34 2010
./sockserv 22710: begat 27092 at Mon Aug 16 15:09:34 2010
./sockserv 22710: connection on catsock at Mon Aug 16 15:09:34 2010
./sockserv 22710: begat 27093 at Mon Aug 16 15:09:34 2010
./sockserv 22710: connection on catsock at Mon Aug 16 15:09:34 2010
./sockserv 22710: begat 27094 at Mon Aug 16 15:09:34 2010
./sockserv 22710: reaped 27092 at Mon Aug 16 15:09:34 2010
./sockserv 22710: reaped 27094 at Mon Aug 16 15:09:34 2010
~/tmp$
Any ideas what's going on here? The only hint I have is that the third reaping isn't happening. Since it's consistent across the environments I have, it's got to be something with the code or the way I'm calling the client, but I don't know the specifics and would appreciate any edification you can offer.

Comment on Unix-Domain TCP Server Crashing
Select or Download Code
Re: Unix-Domain TCP Server Crashing
by cdarke (Prior) on Aug 17, 2010 at 07:44 UTC
    ~/tmp$ ./sockclient & ./sockclient & ./sockclient

    First, I'm going to make the assumption that ~/tmp$ is the prompt.
    Of the rest, what worries me is the use of ampersand, Donald Duck, &. Bash (I assume you are using Bash) will take that to run the preceeding command in background, however the right-most command is not run in background in that way. When you run jobs in background it is difficult to predict the order of execution, and different results can ensue.

    I wonder, did you mean this:
    ./sockclient && ./sockclient && ./sockclient
    ?
      Your prompt and bash assumptions are correct, though I don't understand how Donald Duck fits in. I was remiss in not being more specific, I will add that info to the node so that it's clear. Thanks for pointing that out.

      Given that, no, I did not mean '&&'. I was specifically trying to execute multiple requests at the same time. The server is written around a fork, and I was testing how this behaved.

      These results are obviously different, as you say, but they are consistent and reproducible across several platforms (granted, all running bash), and my question is why and how this comes to pass.
Re: Unix-Domain TCP Server Crashing
by rowdog (Curate) on Aug 17, 2010 at 16:32 UTC

    I believe the accept is failing under heavy load.

    I was puzzled by the weird shell parsing and wondered if that was part of the problem so I whipped up a lame threaded client that can reliably "crash" the server on my Debian system.

    #!/usr/bin/perl use 5.10.0; use strict; use warnings; use Socket; use threads; my $rendezvous = shift || 'catsock'; my $max_clients = 10; $_->join for map { threads->create( \&run_client ) } 1 .. $max_clients +; sub run_client { my $sock; unless ( socket($sock, PF_UNIX, SOCK_STREAM, 0) ) { warn "socket: $!"; return; } unless ( connect($sock, sockaddr_un($rendezvous)) ) { warn "connect: $!"; return; }; while ( defined(my $line = <$sock>) ) { print $line; } }

    The next thing I noticed was that an strace of the server seemed to exit normally. Immediately after the for loop I added

    print "\n\nWTF?\n\n";

    and got

    $ perl -T 855342.pl 855342.pl 22345: server started on catsock at Tue Aug 17 11:00:49 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22348 at Tue Aug 17 11:00:51 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22350 at Tue Aug 17 11:00:51 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22352 at Tue Aug 17 11:00:51 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22355 at Tue Aug 17 11:00:51 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22356 at Tue Aug 17 11:00:51 2010 WTF?

    That means that the for condition accept(Client,Server) || $waitedpid; is evaluating to false. Just before the exit, the strace shows

    accept(3, 0x7fff3015acc0, [4096]) = ? ERESTARTSYS (To be restart +ed)

    Therefore, I believe the accept is failing under heavy load and my advice is to always check the return value from accept.

      Thank you for your informed response. This will help me in future debugging as well. By changing the for loop to:
      while(1) { accept(Client,Server) || next; logmsg "connection on $NAME"; spawn sub { print "Hello there, it's now ", scalar localtime, "\n"; exec '/usr/games/fortune' or die "can't exec fortune: $!"; }; close Client; }
      The problem no longer appears. Now all it needs are hup and err handlers and it's on its way to being a proper daemon. Thank you so much. Now to figure out how to get bi-directional passing with storable...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://855342]
Approved by lostjimmy
Front-paged by tye
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (11)
As of 2014-10-23 13:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (125 votes), past polls