http://www.perlmonks.org?node_id=417886

Forsaken has asked for the wisdom of the Perl Monks concerning the following question:

esteemed monks, i beg for another moment of your valuable time

still tinkering around with my irc scripts... ;-)

situation is as follows. i have a number of socket connections open, which are managed by IO::Select in order for me to know which ones have data waiting to be read. the data itself comes from an irc server and thus follows a predictable pattern, in that the traffic is composed by lines followed by \n, so i know what i can expect. right now i am using the following to read the data one line at a time:

$char = ''; until ($char eq "\n") { sysread($filehandle, $char, 1); $line .= $char; }

after which $line gets processed and the whole thing starts all over again. somehow this reading 1 char at a time seems really cumbersome, not to mention prone to problems in case of unforeseen circumstances. I would highly appreciate your insight on this matter.

Update:
looks like I'm not the only who's been faced with this question. Apparently a module IO::Getline was capable of reading data from an unbuffered socket one line at a time, although so far I have been unsuccesful in tracking it down. Another suggestion was to read in data at more than 1 byte at a time, but as far as i know, when trying to read more than x bytes when only x bytes are waiting the read will block? Is there a way to determine not only IF there is data waiting, but also how much?

Replies are listed 'Best First'.
Re: unbuffered read from socket
by Errto (Vicar) on Dec 29, 2004 at 01:19 UTC
    If all of the data coming from $filehandle is plain text, why not use Perl's normal buffered file ops?
    my $line = <$filehandle>; chomp $line;
    Update: If, OTOH, you can't use the buffered ops, then I do think reading one character at a time is your best bet. But you can do better than appending the character with .= each time:
    my $line; open my $linefh, \$line; until ($char eq "\n") { sysread($filehandle, $char, 1); print $linefh $char; } close $linefh;
    The above requires Perl 5.8 I think. If you don't have that you can use IO::Scalar.

    Update 2: I noticed that there's a Net::IRC module already out there. Might this work better for you than playing with sockets directly?

      I do think reading one character at a time is your best bet

      It's generally poor form and very slow to read one char at a time with sysread(); it's unbuffered, as is C's read(), so you're making a system call per character. Ouch. If you ask for 1024 bytes and only one is available, it will be returned in $buf, sysread returning the number of chars actually read. For example:

      my $buf; my $nread; open(my $filehandle, '<', 'f.tmp') or die "error open: $!"; while ($nread = sysread($filehandle, $buf, 1024)) { print "nread=$nread buf='$buf'\n"; }

      To illustrate my point, this program:

      use strict; my $buf; my $nread; open(my $filehandle, '<', 'f.tmp') or die "error open f.tmp: $!"; binmode($filehandle); open(my $fhout, '>', 'g.tmp') or die "error open g.tmp: $!"; binmode($fhout); while ($nread = sysread($filehandle, $buf, 1)) { print $fhout $buf; } close($filehandle); close($fhout);
      takes 27 seconds to copy file f.tmp to g.tmp, where f.tmp is perl-5.8.6.tar.gz (about 12 MB). If you simply change:
      sysread($filehandle, $buf, 1)
      above to:
      sysread($filehandle, $buf, 1024)
      the time reduces to 0.2 seconds.

        Yes, that is certainly true, but it does increase complexity because the developer then has to do his own buffering. IOW, he only wants one line but he might well end up reading more than that, up to 1k so that has to be stored somewhere.
      the problem is that using buffered IO combined with select is a bad idea from what i understand, and i have seen some flakey behaviour from it, so it might indeed be a bad idea.

      Net::IRC is discontinued and has been replaced by POE::Component::IRC which, although most definitely a terrific module, is not quite what I'm looking for.

      I'll give the option you listed using print to a filehandle a try, although right now cpu time seems to be very reasonable. even when spamming the bot with large amounts of text cpu time never exceeds 1%. dunno what'll happen when more processing is added though, so any improvement suggested is worth a try.