http://www.perlmonks.org?node_id=855540


in reply to Unix-Domain TCP Server Crashing

I believe the accept is failing under heavy load.

I was puzzled by the weird shell parsing and wondered if that was part of the problem so I whipped up a lame threaded client that can reliably "crash" the server on my Debian system.

#!/usr/bin/perl use 5.10.0; use strict; use warnings; use Socket; use threads; my $rendezvous = shift || 'catsock'; my $max_clients = 10; $_->join for map { threads->create( \&run_client ) } 1 .. $max_clients +; sub run_client { my $sock; unless ( socket($sock, PF_UNIX, SOCK_STREAM, 0) ) { warn "socket: $!"; return; } unless ( connect($sock, sockaddr_un($rendezvous)) ) { warn "connect: $!"; return; }; while ( defined(my $line = <$sock>) ) { print $line; } }

The next thing I noticed was that an strace of the server seemed to exit normally. Immediately after the for loop I added

print "\n\nWTF?\n\n";

and got

$ perl -T 855342.pl 855342.pl 22345: server started on catsock at Tue Aug 17 11:00:49 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22348 at Tue Aug 17 11:00:51 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22350 at Tue Aug 17 11:00:51 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22352 at Tue Aug 17 11:00:51 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22355 at Tue Aug 17 11:00:51 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22356 at Tue Aug 17 11:00:51 2010 WTF?

That means that the for condition accept(Client,Server) || $waitedpid; is evaluating to false. Just before the exit, the strace shows

accept(3, 0x7fff3015acc0, [4096]) = ? ERESTARTSYS (To be restart +ed)

Therefore, I believe the accept is failing under heavy load and my advice is to always check the return value from accept.

Replies are listed 'Best First'.
Re^2: Unix-Domain TCP Server Crashing
by wokka (Acolyte) on Aug 17, 2010 at 19:22 UTC
    Thank you for your informed response. This will help me in future debugging as well. By changing the for loop to:
    while(1) { accept(Client,Server) || next; logmsg "connection on $NAME"; spawn sub { print "Hello there, it's now ", scalar localtime, "\n"; exec '/usr/games/fortune' or die "can't exec fortune: $!"; }; close Client; }
    The problem no longer appears. Now all it needs are hup and err handlers and it's on its way to being a proper daemon. Thank you so much. Now to figure out how to get bi-directional passing with storable...