http://www.perlmonks.org?node_id=837267

vsespb has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I worked with a perl blocking sockets and found a weird thing.

Looks like IO::Select works bad when you read from sockets with <$fh> in case someone on the other end wrote to the socket _two_ or more lines. I.e. <$fh> will return one line. Second line will be read too by perl process but kept in perlio layer buffers. So IO::Select will return no sockets too read.

In practice this can produce hardly-to catch errors on application. I.e application can work _fine_ for a long time until it receive two lines at once cause of a big delay when handling previous lines.

I think this is lack in perl documentation. There is no word about this. Nothing about possible _read_ buffering problem. And lots of perl code all over the internet with io::select and <$fh>.

What do you think ?

Thanks.

Code (proof-of-concept):

#!/usr/bin/perl -w use strict; use IO::Select; use IO::Pipe; my $fromchild = new IO::Pipe; my $tochild = new IO::Pipe; my $pid; my $parent_pid = $$; if($pid = fork()) { # Parent $fromchild->reader(); $fromchild->autoflush(1); $fromchild->blocking(1); binmode $fromchild; $tochild->writer(); $tochild->autoflush(1); $tochild->blocking(1); binmode $tochild; my $read_set = new IO::Select(); # create handle set for reading $read_set->add($fromchild); while(1) { print "before select\n"; my ($rh_set, undef, $ex_set) = IO::Select->select($read_set, undef +, $read_set, 30); print "after select\n"; for my $rh (@$rh_set) { my $s = <$rh>; print "command: $s"; } } } elsif (defined ($pid)) { # Child $fromchild->writer(); $fromchild->autoflush(1); $fromchild->blocking(1); binmode $fromchild; $tochild->reader(); $tochild->autoflush(1); $tochild->blocking(1); binmode $tochild; print $fromchild "abc\n"; #sleep(1); ### IF you uncomment this line it will work print $fromchild "def\n"; sleep(86400); die; } __END__ =Output=: $ ./poc1.pl before select after select command: abc before select (process hangs here)

Replies are listed 'Best First'.
Re: poorly documented behaviour of readline() and IO::Select
by BrowserUk (Patriarch) on Apr 28, 2010 at 11:14 UTC

    This is always a problem when you try to layer line-oriented semantics atop stream-oriented protocols. Especially, when the buffering is done neither at the system level--where select gets its info--nor at the application level--where the application programmer has some control. Ie. In the intermediate layer, which leaves the application and system with differing ideas about the current state of the communications channel.

    The only solution I've found is to set the socket into :raw mode, use sysread to get whatever is available, and do the line buffering semantics within the application itself. Ie. Cut out the middle man.

    For the longest time I've been under the assumption that this disconnect was unique to Windows. Seems I was wrong, despite having asked the question here on numerous occasions.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Just wrote a POC solution which seems to fix issue with sysread/syswrite and specifying length of the message in the beginning of it (of course this solution is suitable if you can patch both client and server i.e. modify your protocol).

      It works fine (linux ubuntu).

      #!/usr/bin/perl -w use strict; use IO::Select; use IO::Pipe; my $fromchild = new IO::Pipe; my $tochild = new IO::Pipe; my $pid; my $parent_pid = $$; if($pid = fork()) { # Parent $fromchild->reader(); $fromchild->autoflush(1); $fromchild->blocking(1); binmode $fromchild; $tochild->writer(); $tochild->autoflush(1); $tochild->blocking(1); binmode $tochild; my $read_set = new IO::Select(); # create handle set for reading $read_set->add($fromchild); while(1) { print "before select\n"; my ($rh_set, undef, $ex_set) = IO::Select->select($read_set, undef +, $read_set, 30); print "after select\n"; for my $rh (@$rh_set) { my $s = receive_line($rh); print "command: $s"; } } } elsif (defined ($pid)) { # Child $fromchild->writer(); $fromchild->autoflush(1); $fromchild->blocking(1); binmode $fromchild; $tochild->reader(); $tochild->autoflush(1); $tochild->blocking(1); binmode $tochild; send_line($fromchild, "abc\n"); send_line($fromchild, "def\n"); sleep(86400); die; } sub send_line { my ($socket, $line) = @_; my $msg = sprintf("%07d %s", length($line), $line); syswrite $socket, $msg; } sub receive_line { my ($socket) = @_; sysread $socket, my $len, 8; sysread $socket, my $line, $len, 8; return $line; } __END__ Output: before select after select command: abc before select after select command: def before select

        sysread won't always give you the amount of bytes you request when reading from something that isn't a plain file.

        See Re: A suicidal parent OR death of a forking server for a newline terminated solution, and here's a length-prefix adaptation:

        #!/usr/bin/perl use strict; use warnings; use IO::Socket::INET qw( ); use IO::Select qw( ); sub process_msg { my ($client, $msg) = @_; chomp $msg; print "$client->{host} said '$msg'\n"; } sub process_msgs { my ($client) = @_; our $buf; local *buf = \($client->{buf}); our $want; local *want = \($client->{want}); for (;;) { if ($want) { return if length($buf) < $want; my $msg = substr($buf, 0, $want, ''); $want = 0; process_msg($client, $msg); } else { return if length($buf) < 8; $want = 0+substr($buf, 0, 8, ''); } } } my $server = IO::Socket::INET->new( ... ) or die("Couldn't create server socket: $!\n"); my $select = IO::Select->new($server); my %clients; while (my @ready = $select->can_read) { for my $fh (@ready) { if ($fh == $server) { my $client_sock = $server->accept; my $host = $client_sock->peerhost; print "[Accepted connection from $host]\n"; $select->add($client_sock); $clients{fileno($client_sock)} = { host => $host, buf => '', want => 0, }; } else { my $client = $clients{fileno($fh)}; our $buf; local *buf = \($client->{buf}); our $want; local *want = \($client->{want}); my $rv = sysread($fh, $buf, 64*1024, length($buf)); if (!$rv) { my $host = $client->{host}fh->peerhost; if (defined($rv)) { print "[Error reading from host $host]\n"; } else { print "[Connection from $host terminated]\n"; } process_msgs($client); print "Incomplete message received from $host]\n" if $want || length($buf); delete $clients{fileno($fh)}; $sel->remove($fh); next; } process_msgs($client); } } }

        Yes. I've always favoured length prefixes to delimiters in protocols I've written--mostly RS-232, RS-432 and some short range wireless hand-held devices (think barcode scanners in supermarkets but 15+years ago).

        But that doesn't really fit with the *nix, everything-is-a-file way of working.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      why to set to :raw mode ? use of sysread is not enough ?
        use of sysread is not enough ?

        Dunno, maybe. But I reason that if there are no IO layers associated with the file descriptor, there is no chance that they are interfering in any way.

        I have to admit to being quite sceptical about the benefits of Perl's IO layers. Just seems to be another few layers of indirection between me and my data.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

      Anyway. I wish "fix" this somehow in perl distribution. I.e. fix perl documentation.

      Write note that "select/can_read indeed can return nothing when you actually can read"

      Do you think that would be possible ? How to do this ? Use http://rt.perl.org/perlbug/ ?

        Currently, the documentation for select says

        WARNING: One should not attempt to mix buffered I/O (like read or <FH>) with select, except as permitted by POSIX, and even then only on POSIX systems. You have to use sysread instead.

        This should be present in the documentation for IO::Select as well.

        (I wonder what exceptions POSIX makes, and if they are still applicable to Perl.)

        Try typing 'perlbug' on your shell command line.

Re: poorly documented behaviour of readline() and IO::Select
by Illuminatus (Curate) on Apr 28, 2010 at 13:48 UTC
      yes, this code looks like a better workaround.
Re: poorly documented behaviour of readline() and IO::Select
by choroba (Cardinal) on Apr 28, 2010 at 10:57 UTC
    I had a similar problem years ago. A server was communicating with several clients, but sometimes one line of the communication got lost somewhere. The sleep solution solved the problem, but even then I was not 100% sure the error did not occur rarely, time to time.
      sleep here is just for example. correct solution would be read with sysread/syswrite and detect newline somehow. or develop protocol which not depends on delimiter for messages.
      Also! if you use eof or eof() together with sysread sysread does not work! (line gets buffered to read it by readline()) !