Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Unix-Domain TCP Server Crashing

by rowdog (Curate)
on Aug 17, 2010 at 16:32 UTC ( #855540=note: print w/ replies, xml ) Need Help??


in reply to Unix-Domain TCP Server Crashing

I believe the accept is failing under heavy load.

I was puzzled by the weird shell parsing and wondered if that was part of the problem so I whipped up a lame threaded client that can reliably "crash" the server on my Debian system.

#!/usr/bin/perl use 5.10.0; use strict; use warnings; use Socket; use threads; my $rendezvous = shift || 'catsock'; my $max_clients = 10; $_->join for map { threads->create( \&run_client ) } 1 .. $max_clients +; sub run_client { my $sock; unless ( socket($sock, PF_UNIX, SOCK_STREAM, 0) ) { warn "socket: $!"; return; } unless ( connect($sock, sockaddr_un($rendezvous)) ) { warn "connect: $!"; return; }; while ( defined(my $line = <$sock>) ) { print $line; } }

The next thing I noticed was that an strace of the server seemed to exit normally. Immediately after the for loop I added

print "\n\nWTF?\n\n";

and got

$ perl -T 855342.pl 855342.pl 22345: server started on catsock at Tue Aug 17 11:00:49 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22348 at Tue Aug 17 11:00:51 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22350 at Tue Aug 17 11:00:51 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22352 at Tue Aug 17 11:00:51 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22355 at Tue Aug 17 11:00:51 2010 855342.pl 22345: connection on catsock at Tue Aug 17 11:00:51 2010 855342.pl 22345: begat 22356 at Tue Aug 17 11:00:51 2010 WTF?

That means that the for condition accept(Client,Server) || $waitedpid; is evaluating to false. Just before the exit, the strace shows

accept(3, 0x7fff3015acc0, [4096]) = ? ERESTARTSYS (To be restart +ed)

Therefore, I believe the accept is failing under heavy load and my advice is to always check the return value from accept.


Comment on Re: Unix-Domain TCP Server Crashing
Select or Download Code
Re^2: Unix-Domain TCP Server Crashing
by wokka (Acolyte) on Aug 17, 2010 at 19:22 UTC
    Thank you for your informed response. This will help me in future debugging as well. By changing the for loop to:
    while(1) { accept(Client,Server) || next; logmsg "connection on $NAME"; spawn sub { print "Hello there, it's now ", scalar localtime, "\n"; exec '/usr/games/fortune' or die "can't exec fortune: $!"; }; close Client; }
    The problem no longer appears. Now all it needs are hup and err handlers and it's on its way to being a proper daemon. Thank you so much. Now to figure out how to get bi-directional passing with storable...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://855540]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (15)
As of 2015-07-01 15:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (6 votes), past polls