http://www.perlmonks.org?node_id=999744

flexvault has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I am trying to write a server that uses 'IO::Select' to support multiple clients. The following code works 98% of the time, but ever once in a while the server doesn't get any errors that client(s) have crashed. For normal client termination, the client sends a 'close' command and everything is fine. This is only for abnormal termination sessions.

The problem is that when the server thinks it has at least one client, but in fact they are all closed, the loop takes up 100% of the cpu doing nothing. How do I get 99.999% detection?

Note: The code is an adaption of code provided by BroswerUK .

Server:
#!/usr/local/bin/pyrperl -w ### Server use strict; use warnings; use IO::Socket::INET; use IO::Select; # flush after every write $| = 1; $SIG{PIPE} = 'IGNORE'; our ( $socket, $select, @ready_clients, %All_Clients ); my $localhost = '127.0.0.1'; my $port = 12345; # creating object interface of IO::Socket::INET modules which internal +ly does # socket creation, binding and listening at the specified port address +. $socket = new IO::Socket::INET ( LocalHost => $localhost, LocalPort => $port, Proto => 'tcp', Listen => 5, Reuse => 1, ) or die "ERROR in Socket Creation + : $!\n"; $socket->autoflush; binmode $socket; $| = 1; $select = new IO::Select() or die "IO::Select $!"; $select->add($socket); # add the main socket to the set $All_Clients{$socket} = 1; print "SERVER Waiting for client connection on port $port\n"; my $rdata = ""; for ( 0..255) { $rdata .= chr($_); } my $len = pack 'N',length($rdata); my $sdata = $len . $rdata; my $start = 0; my $finish = 0; my $tasks = 0; while(1) { my @ready = $select->can_read(.025); foreach my $client (@ready) { if ( $client == $socket ) #{ next; } # if ( ! exists $All_Cl +ients{$client} ) { # Create a new socket my $new = $client->accept(); # binmode $new; $| = 1; $select->add($new); $All_Clients{$new} = 1; $start++; $tasks++; print " Client connected\n"; } else { # Process socket my $lost = 0; my $ret = recv( $client, my $in, 4, 0 ); if ( ! defined $ret ) { $lost++; } if ( ( $lost==0 )&&( length($in)==4 ) ) { my $len = unpack('N',$in); # print +"**** received ****\n"; $ret = recv( $client, my $data, $len, 0 ); if ( ! defined $ret ) { $lost++; } if ( $lost==0 ) { if ( $rdata ne $data ) { die "3.\n$rdata\n$da +ta \n"; } $ret = send( $client, $sdata, length($sdata), + 0 ); if ( ! defined $ret ) { $lost++; } } } if ( $lost > 0 ) { # Maybe we have finished with the socket $select->remove($client); $client->close; $tasks--; my $total = $select->count(); print " Client dis-connected, $tasks left $total +\n"; } } } my $total = $select->count(); if ( ( $start )&&( $total == 1 ) ) { if ( $finish == 0 ) { $finish = time; } else { if ( time - $finish > 3 ) { exit; } } } } exit;
Client:
#!/usr/local/bin/pyrperl -w #Client use strict; use Time::HiRes qw[ time usleep ]; use IO::Socket::INET; $\ = $/ = chr(13).chr(10); my $rdata = ""; for ( 0..255) { $rdata .= chr($_); } my $len = pack 'N',length($rdata); my $sdata = $len . $rdata; print length($rdata),"\n"; my $svr = IO::Socket::INET->new( "localhost:12345" ) or die "Client: First client connect failed $^E"; binmode $svr; $| = 1; print "Client connected"; my $last = time + 1; my $exchanges = 0; while( 1 ) { send( $svr, $sdata, length($sdata), 0 ) or die "$! / $^E"; my $in; recv ( $svr, $in, 4, 0 ); if ( defined $in ) { my $len = unpack('N',$in); recv ( $svr, my $data, $len, 0 ); if ( $rdata ne $data ) { die "3. $! \n"; } ++$exchanges; # print "$len\t$exchanges\n"; if( time > $last ) { my $rate = sprintf( "%.f", $exchanges ); print "$$ Rate: $rate exchanges/sec\n"; $last = time + 1; $exchanges = 0; } } }

Thanks for looking...Ed

"Well done is better than well said." - Benjamin Franklin

Replies are listed 'Best First'.
Re: IO::Select and correct way to detect client crashed?
by zentara (Archbishop) on Oct 18, 2012 at 16:02 UTC
    I can only relate what I have found is the best way, and that is to use the Glib eventloop system's IO addwatch to handle the conditions. What I have found from practical usage, is that if you get a IN condition, but there are no bytes to be read by sysread, then your connection is down. If you watch the eventloop run, you will see many IN callbacks fired, but there is never any data... there is your clue the connection is lost.

    Here is the Glib eventloop code, and it works well with a GUI.

    #!/usr/bin/perl use warnings; use strict; use Glib; use IO::Socket; $|++; my @clients; #used for root messaging to all # a cheap and easy way to prevent zombie children # when the forked child exits # avoids the waitpid stuff,otherwise, the defunct # forked children will wait until the main parent script ends. $SIG{CHLD} = 'IGNORE'; my $num_of_client = -1; my $port = 2345; my $server = new IO::Socket::INET( Timeout => 7200, Proto => "tcp", LocalPort => $port, Reuse => 1, Listen => SOMAXCONN ); print "\n",$server,' ',fileno($server),"\n"; if( ! defined $server){ print "\nERROR: Can't connect to port $port on host: $!\n" ; exit; } else{ print "\nServer up and running on $port\n" } my $main_loop = Glib::MainLoop->new; #my $con_watcher = Gtk2::Helper->add_watch ( fileno( $server ), # 'in', \&callback, $server ); #my $stdin_watcher = Gtk2::Helper->add_watch ( fileno( 'STDIN' ), # 'in', \&watch_stdin, 'STDIN' ); my $con_watcher = Glib::IO->add_watch ( fileno( $server ), 'in', \&callback, $server ); my $stdin_watcher = Glib::IO->add_watch ( fileno( 'STDIN' ), 'in', \&watch_stdin, 'STDIN' ); $main_loop->run; sub watch_stdin { # this is line oriented, # enter as many lines as you want # and you must press Control-d when # finished to send # print "@_\n"; my ($fd, $condition, $fh) = @_; my (@lines) = (<STDIN>); print @lines; foreach my $cli(@clients){ if($cli->connected){ print $cli 'MESSAGE-> ', @lines; }else{ # remove dead client @clients = grep { $_ ne $cli } @clients; + } } #always return TRUE to continue the callback return 1; } sub callback{ my ( $fd, $condition, $fh ) = @_; print "callback start $fd, $condition, $fh\n"; #this grabs the incoming connections and forks them off my $client; do { $client = $server->accept } until ( defined($client) ); print "accepted a client, id = ", ++$num_of_client, "\n"; # going into forked handler if ( !fork ) { close($server); #this only closes the copy in the child pro +cess # Gtk2::Helper->remove_watch( $con_watcher ); #remove server po +rt watch in child # Gtk2::Helper->remove_watch( $stdin_watcher ); #remove STDIN w +atch in child Glib::Source->remove( $con_watcher ); #remove server port watc +h in child Glib::Source->remove( $stdin_watcher ); #remove STDIN watch in + child # add a new watch in the forked client my $cli_watcher = Glib::IO->add_watch( fileno( $client ), ['in', 'hup','err'], \&cli_callback, $client); sub cli_callback{ print "\ncli_callback @_\n"; my ( $fd, $condition, $client ) = @_; # since 'in','hup', and 'err' are not mutually exclusive, # they can all come in together, so test for hup/err first if ( $condition >= 'hup' or $condition >= 'err' ) { # End Of File, Hang UP, or ERRor. that means # we're finished. #print "\nhup or err received\n"; #close socket $client->close; $client = undef; # normally return 0 here, # except we need to exit the fork, down below # return 0; #stop callback } # if the client still exists, get data and return 1 to keep callback a +live if ($client) { if ( $condition >= 'in' ){ # data available for reading my $bytes = sysread($client,my $data,1024); if ( defined $data ) { # do something useful with the text. print length $data, $data,"\n"; print $client "$data\n"; #echo back } } # the file handle is still open, so return TRUE to # stay installed and be called again. # print "still connected\n"; # possibly have a "connection alive" indicator #print "still alive\n"; return 1; } else { # we're finished with this job. start another one, # if there are any, and uninstall ourselves. print "child exiting\n"; #return 0; #exit instead exit; #since this is forked, we exit } } #end of client callback } #end of forked code else { push @clients, $client; #save clients for root message # back to parent, close client that's been forked #print "\nin parent closed forked client $client\n"; #close($client); # this only closes the copy in the parent proces +s, # assume the parent no longer need talk to the clie +nt } return 1; # keep the main port watching callback alive } __END__

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh

      zentara,

      Thanks, I'll look into it. What I was hoping for was a core Perl solution, but a solution is better than no solution...Ed

      "Well done is better than well said." - Benjamin Franklin

Re: IO::Select and correct way to detect client crashed?
by BrowserUk (Patriarch) on Oct 18, 2012 at 18:40 UTC
    Note: The code is an adaption of code provided by BroswerUK .

    I utterly refute that assertion.

    How you dare to claim that your non-functional, incorrectly designed, and horribly implemented single-threaded, event-driven, asynchronous server & client are an "adaption from" my functioning, multi-threaded, synchronous server and client is quite beyond my understanding.

    The only elements they have in common -- the timing mechanism and the length-prefixed protocol -- are in no way responsible for the failures of your code!.

    The code I supplied to you: works. It is your clumsy and incorrect adaption of it that is broken; so please do not attribute that to me.

    (Yes. I am offended.)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    RIP Neil Armstrong

      BrowserUk,

      I apologize! Your code does work and that is why I linked to your code to show the world and you, that my 'changes' are mine, but still giving you credit for the initial code that put me in the right direction.

      That said, the above code works 100% in non-treading environment for 14 days plus with 64 clients with 16 cores at 100%. But with Perl threads on AIX and Linux, I haven't been able to keep the threaded code running for more than a day. I suspect the underlining thread libraries may be the problem or it's my lousy code.

      I'm looking for a solution with 'IO::Select' until I find the problem with my treaded code. Again I apologize.

      "Well done is better than well said." - Benjamin Franklin

        I'm looking for a solution with 'IO::Select' until I find the problem with my treaded code.

        FWIW: to the best of my ability to determine, the latest version of IO::Select is simply broken. Whether this is confined to my platform or is universal I have no way to know.

        I'd much rather look at fixing your threads problem than working with IO::Select.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        RIP Neil Armstrong

Re: IO::Select and correct way to detect client crashed?
by andal (Hermit) on Oct 19, 2012 at 07:19 UTC

    Well, I would recommend to look not into perl, but into TCP protocol, since you are using it. This protocol is not hidden away from you, you have to manage the events produced by the protocol stack.

    From the perspective of the protocol, you make quite a few mistakes. First of all, when using "select" one should make sockets non-blocking. For example, there's no guarantee that "accept" will return socket after the "select" has marked it as ready. By the time you call this function, kernel could have removed the socket, so you would end up blocked in the "accept" waiting for next connection. Of course, probability of this event is small, but you want to have robust program, right?

    Next thing. The TCP is "stream" protocol. So, you can not know how much of data is available for reading when your socket is marked as read-ready. You may get only 2 bytes out of expected 4, but when you call "recv" you'll get those 2. Your code is not prepared for that. It simply would throw away those 2 bytes.

    Writing to TCP socket also requires care. There's no guarantee that the peer will read quickly. So, you may fill up your system buffers (if you write a lot) and then your call to "send" will block (if the socket is blocking), or won't write everything, which you don't check.

    And now concerning your question about detecting presence of peer. As someone here already mentioned, if the peer has closed connection, or crashed (application crash), then your end of the socket shall become "read-ready", but when you try to read from it, you'll read 0 bytes. In this case you know that the peer is gone and you should close your socket.

    There's a catch here. If the peer has crashed when you write to it, then you won't see the 0 bytes input. In this case, your write may result in EPIPE error, actually by default it is passed as signal killing your whole application. So, you may want to pass "MSG_NOSIGNAL" flag to function "send" to prevent it. But if your write is short, then you'll get your socket "read-ready" after it, and you'll read the 0 bytes input, indicating that the peer can not accept your input any more.

    Finally. Don't put small timeout for "select" call. Why should you do anything, if you don't have any sockets ready? Real application would need support for timers, the presence of timers would define the time-out for "select" call. If you need timers, then use something like GLib events loop instead of "select".

    Again, all of the above stuff comes from knowing TCP protocol. You should know how the connection is established on the protocol level, you should know, how the data is passed, how the errors are communicated between peers and between protocol stack and the application. I encourage you to read "Unix network programming" written by Stevens. This book provides very complete description of networking.

Re: IO::Select and correct way to detect client crashed?
by sundialsvc4 (Abbot) on Oct 18, 2012 at 19:50 UTC

    The only practical way that I have ever found to determine whether a client has crashed is to oblige the protocol between the two to periodically send a “heartbeat” message if there is nothing else to be sent during some mutually agreed-upon interval.   The host should record the latest timestamp as each socket sends a message, or just a boolean flag, and use a periodic timer, say about half-again as long as the heartbeat, to check for absence of response within that time.   Upon which the host might presume that the client is dead.   You see, you really can’t be 100% sure, AFAIK, that all of the intermediates between here-and-there are still in good working order.   Maybe the host knows that the socket has been closed or that communication has failed, or maybe it doesn’t.   But if a bottle washes up on shore, even if it’s empty, you know the sender is still out there.

      More of your patented, utter ribald tosh...

      (Don't be encouraged by the upvote; that was me by accident.)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      RIP Neil Armstrong

        Ribald?!?
        --
        A math joke: r = | |csc(θ)|+|sec(θ)|-||csc(θ)|-|sec(θ)|| |