poorly documented behaviour of readline() and IO::Select

vsespb has asked for the wisdom of the Perl Monks concerning the following question:

Hi. I worked with a perl blocking sockets and found a weird thing.

Looks like IO::Select works bad when you read from sockets with <$fh> in case someone on the other end wrote to the socket _two_ or more lines. I.e. <$fh> will return one line. Second line will be read too by perl process but kept in perlio layer buffers. So IO::Select will return no sockets too read.

In practice this can produce hardly-to catch errors on application. I.e application can work _fine_ for a long time until it receive two lines at once cause of a big delay when handling previous lines.

I think this is lack in perl documentation. There is no word about this. Nothing about possible _read_ buffering problem. And lots of perl code all over the internet with io::select and <$fh>.

What do you think ?

Thanks.

Code (proof-of-concept):

#!/usr/bin/perl -w 

use strict; 
use IO::Select; 
use IO::Pipe;

  my $fromchild = new IO::Pipe;
  my $tochild = new IO::Pipe;
  my $pid;
  my $parent_pid = $$;
  if($pid = fork()) { # Parent
   $fromchild->reader();
   $fromchild->autoflush(1);
   $fromchild->blocking(1);
   binmode $fromchild;
   $tochild->writer();
   $tochild->autoflush(1);
   $tochild->blocking(1);
   binmode $tochild;

   my $read_set = new IO::Select(); # create handle set for reading
   $read_set->add($fromchild);

   while(1) {
    print "before select\n";
    my ($rh_set, undef, $ex_set) = IO::Select->select($read_set, undef
+, $read_set, 30);
    print "after select\n";
    for my $rh (@$rh_set) {
        my $s =  <$rh>;
        print "command: $s";
    }
   }
  } elsif (defined ($pid)) { # Child
   $fromchild->writer();
   $fromchild->autoflush(1);
   $fromchild->blocking(1);
   binmode $fromchild;
   $tochild->reader();
   $tochild->autoflush(1);
   $tochild->blocking(1);
   binmode $tochild;
   print $fromchild "abc\n";
   #sleep(1);  ### IF you uncomment this line it will work
   print $fromchild "def\n";

   sleep(86400);
   die;
  }

__END__

=Output=:
$ ./poc1.pl
before select
after select
command: abc
before select

(process hangs here)
[download]

Comment on poorly documented behaviour of readline() and IO::Select Download Code

Replies are listed 'Best First'.
Re: poorly documented behaviour of readline() and IO::Select by BrowserUk (Patriarch) on Apr 28, 2010 at 11:14 UTC
This is always a problem when you try to layer line-oriented semantics atop stream-oriented protocols. Especially, when the buffering is done neither at the system level--where select gets its info--nor at the application level--where the application programmer has some control. Ie. In the intermediate layer, which leaves the application and system with differing ideas about the current state of the communications channel. The only solution I've found is to set the socket into :raw mode, use sysread to get whatever is available, and do the line buffering semantics within the application itself. Ie. Cut out the middle man. For the longest time I've been under the assumption that this disconnect was unique to Windows. Seems I was wrong, despite having asked the question here on numerous occasions. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply]
Re^2: poorly documented behaviour of readline() and IO::Select by vsespb (Chaplain) on Apr 28, 2010 at 13:22 UTC
Just wrote a POC solution which seems to fix issue with sysread/syswrite and specifying length of the message in the beginning of it (of course this solution is suitable if you can patch both client and server i.e. modify your protocol). It works fine (linux ubuntu). #!/usr/bin/perl -w use strict; use IO::Select; use IO::Pipe; my $fromchild = new IO::Pipe; my $tochild = new IO::Pipe; my $pid; my $parent_pid = $$; if($pid = fork()) { # Parent $fromchild->reader(); $fromchild->autoflush(1); $fromchild->blocking(1); binmode $fromchild; $tochild->writer(); $tochild->autoflush(1); $tochild->blocking(1); binmode $tochild; my $read_set = new IO::Select(); # create handle set for reading $read_set->add($fromchild); while(1) { print "before select\n"; my ($rh_set, undef, $ex_set) = IO::Select->select($read_set, undef +, $read_set, 30); print "after select\n"; for my $rh (@$rh_set) { my $s = receive_line($rh); print "command: $s"; } } } elsif (defined ($pid)) { # Child $fromchild->writer(); $fromchild->autoflush(1); $fromchild->blocking(1); binmode $fromchild; $tochild->reader(); $tochild->autoflush(1); $tochild->blocking(1); binmode $tochild; send_line($fromchild, "abc\n"); send_line($fromchild, "def\n"); sleep(86400); die; } sub send_line { my ($socket, $line) = @_; my $msg = sprintf("%07d %s", length($line), $line); syswrite $socket, $msg; } sub receive_line { my ($socket) = @_; sysread $socket, my $len, 8; sysread $socket, my $line, $len, 8; return $line; } __END__ Output: before select after select command: abc before select after select command: def before select [download]	[reply] [d/l]
Re^3: poorly documented behaviour of readline() and IO::Select by ikegami (Patriarch) on Apr 28, 2010 at 14:44 UTC
`sysread` won't always give you the amount of bytes you request when reading from something that isn't a plain file. See Re: A suicidal parent OR death of a forking server for a newline terminated solution, and here's a length-prefix adaptation: #!/usr/bin/perl use strict; use warnings; use IO::Socket::INET qw( ); use IO::Select qw( ); sub process_msg { my ($client, $msg) = @_; chomp $msg; print "$client->{host} said '$msg'\n"; } sub process_msgs { my ($client) = @_; our $buf; local buf = \($client->{buf}); our $want; local want = \($client->{want}); for (;;) { if ($want) { return if length($buf) < $want; my $msg = substr($buf, 0, $want, ''); $want = 0; process_msg($client, $msg); } else { return if length($buf) < 8; $want = 0+substr($buf, 0, 8, ''); } } } my $server = IO::Socket::INET->new( ... ) or die("Couldn't create server socket: $!\n"); my $select = IO::Select->new($server); my %clients; while (my @ready = $select->can_read) { for my $fh (@ready) { if ($fh == $server) { my $client_sock = $server->accept; my $host = $client_sock->peerhost; print "[Accepted connection from $host]\n"; $select->add($client_sock); $clients{fileno($client_sock)} = { host => $host, buf => '', want => 0, }; } else { my $client = $clients{fileno($fh)}; our $buf; local buf = \($client->{buf}); our $want; local want = \($client->{want}); my $rv = sysread($fh, $buf, 64*1024, length($buf)); if (!$rv) { my $host = $client->{host}fh->peerhost; if (defined($rv)) { print "[Error reading from host $host]\n"; } else { print "[Connection from $host terminated]\n"; } process_msgs($client); print "Incomplete message received from $host]\n" if $want \|\| length($buf); delete $clients{fileno($fh)}; $sel->remove($fh); next; } process_msgs($client); } } } [download]	[reply] [d/l]
Re^4: poorly documented behaviour of readline() and IO::Select by ikegami (Patriarch) on Jun 09, 2010 at 19:19 UTC
Re^3: poorly documented behaviour of readline() and IO::Select by BrowserUk (Patriarch) on Apr 28, 2010 at 13:38 UTC
Yes. I've always favoured length prefixes to delimiters in protocols I've written--mostly RS-232, RS-432 and some short range wireless hand-held devices (think barcode scanners in supermarkets but 15+years ago). But that doesn't really fit with the *nix, everything-is-a-file way of working. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply]
Re^2: poorly documented behaviour of readline() and IO::Select by vsespb (Chaplain) on Apr 28, 2010 at 13:04 UTC
why to set to :raw mode ? use of sysread is not enough ?	[reply]
Re^3: poorly documented behaviour of readline() and IO::Select by BrowserUk (Patriarch) on Apr 28, 2010 at 13:16 UTC
use of sysread is not enough ? Dunno, maybe. But I reason that if there are no IO layers associated with the file descriptor, there is no chance that they are interfering in any way. I have to admit to being quite sceptical about the benefits of Perl's IO layers. Just seems to be another few layers of indirection between me and my data. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply]
Re^2: poorly documented behaviour of readline() and IO::Select by vsespb (Chaplain) on Apr 28, 2010 at 13:32 UTC
Anyway. I wish "fix" this somehow in perl distribution. I.e. fix perl documentation. Write note that "select/can_read indeed can return nothing when you actually can read" Do you think that would be possible ? How to do this ? Use http://rt.perl.org/perlbug/ ?	[reply]
Re^3: poorly documented behaviour of readline() and IO::Select by ikegami (Patriarch) on Apr 28, 2010 at 23:33 UTC
Currently, the documentation for `select` says WARNING: One should not attempt to mix buffered I/O (like `read` or `<FH>`) with `select`, except as permitted by POSIX, and even then only on POSIX systems. You have to use `sysread` instead. This should be present in the documentation for IO::Select as well. (I wonder what exceptions POSIX makes, and if they are still applicable to Perl.)	[reply] [d/l] [select]
Re^4: poorly documented behaviour of readline() and IO::Select by vsespb (Chaplain) on Apr 29, 2010 at 14:17 UTC
Re^5: poorly documented behaviour of readline() and IO::Select by ikegami (Patriarch) on Apr 29, 2010 at 18:22 UTC
Re^3: poorly documented behaviour of readline() and IO::Select by BrowserUk (Patriarch) on Apr 28, 2010 at 13:39 UTC
Try typing 'perlbug' on your shell command line.	[reply]
Re: poorly documented behaviour of readline() and IO::Select by Illuminatus (Curate) on Apr 28, 2010 at 13:48 UTC
IO::Handle read error description	[reply]
Re^2: poorly documented behaviour of readline() and IO::Select by vsespb (Chaplain) on Apr 28, 2010 at 13:59 UTC
yes, this code looks like a better workaround.	[reply]
Re: poorly documented behaviour of readline() and IO::Select by choroba (Cardinal) on Apr 28, 2010 at 10:57 UTC
I had a similar problem years ago. A server was communicating with several clients, but sometimes one line of the communication got lost somewhere. The `sleep` solution solved the problem, but even then I was not 100% sure the error did not occur rarely, time to time.	[reply] [d/l]
Re^2: poorly documented behaviour of readline() and IO::Select by vsespb (Chaplain) on Apr 28, 2010 at 11:04 UTC
sleep here is just for example. correct solution would be read with sysread/syswrite and detect newline somehow. or develop protocol which not depends on delimiter for messages.	[reply]
Re^2: poorly documented behaviour of readline() and IO::Select by Anonymous Monk on Apr 28, 2010 at 20:59 UTC
Also! if you use eof or eof() together with sysread sysread does not work! (line gets buffered to read it by readline()) !	[reply]

Back to Seekers of Perl Wisdom