Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Reading from more than one socket at once

by ahunter (Monk)
on Jul 05, 2000 at 00:07 UTC ( #21054=perltutorial: print w/replies, xml ) Need Help??

It bothered me when I was learning this stuff that the perlipc documentation did not go into enough detail about how to create really fun servers, like the ones used for chat room and so on. Corion asked a question recently touching on this, so I thought it may be useful to go through the correct way to create these things. (Since then, two other people have asked roughly the same question...). Note that I don't have Activestate Perl, so I'm not sure if any of the concepts here differ on platforms other than UNIX and workalikes.

Blocking and buffering

Under UNIX there are various special types of file (sockets being a good example of one of these) whose data is not necessarily all immediately available. When you read to the end of the currently available data on one of these files, UNIX will wait for further data to arrive. This is known as 'blocking', and means that if you are reading from only one file, you don't have to keep checking to see if theres more data available.

This causes some problems - you don't know if a read from one of these files will cause a block or not, so if you want to read from another file while one is waiting for data, there is no obvious way of going about it. In addition, most of the Perl file handling routines are buffered, and Perl will read a few bytes ahead to improve performance. If Perl gets blocked while reading ahead, it will wait until it can get all the data it wants before continuing - however, while it is waiting, it is not returning the data it may have already read. This means that sometimes your program can appear to have blocked before it has received all the data that you know has already been sent!

The other use of select

Just to make sure everyone gets thoroughly confused, the perl select function has two uses. Its original use is to select the default filehandle for output. This isn't particularily exciting. The other use is the UNIX select(2) call. This call blocks your process until one of four different events occur - data becomes available on a filehandle for reading, a filehandle becomes available for writing, an exception occurs on a filehandle or a timeout occurs. The 'data to be read' and 'timeout' functions of select makes it perfect for writing servers which have to deal with more than one simultaneous connection, or other applications where blocking is a problem.

This form of select can be accessed either through the IO::Select package or through the select call itself. I'll focus on the select call as opposed to the module here - the techniques for both are very similar, though. select takes four arguments: filehandles to wait for data to read, filehandles to wait for availability to write, filehandles to wait for exceptions and a timeout. The first three arguments have a slightly weird format, owing to the heritage of the command, and are altered on return to indicate which file handles caused select to stop blocking. To mark a file handle as one you are interested in, you need to set the bit corresponding to that file handle's number, as returned by fileno, using vec, like so:

# FILE1 & FILE2 is a filehandle opened elsewhere. $read is a list of # filehandles we are interested in reading my $read = ''; # Initialise to an empty set # (NOTE: $read=0 is very wrong) vec($read, fileno(FILE1), 1) = 1; # Set the appropriate bit vec($read, fileno(FILE2), 1) = 1; # And for another file...
And now to wait for data to become available for reading on that filehandle, we use select to do the job:
my $nfound = select($read, undef, undef, undef);
The undefs here indicate we aren't interested in writing, exceptions or timeouts at the moment. When select returns, $read is changed to contain the list of filehandles with data waiting, and $nfound contains the number of filehandles in the list. The format is still the bitmap, so you need to use vec once again to test if a file is ready for reading:
# Does FILE1 have data waiting? if (vec($read, fileno(FILE1), 1)) { # ... do stuff ... }

Buffering again

Of course, the same old buffering problems I talked about before still apply, and perl may be over-enthusiastically reading ahead and blocking before you get back to the select, causing hair loss all round. The answer is to never, ever use the standard perl file IO function with sockets. That includes print, eof, the <> notation and just about any file function you can think about. Instead, use sysread and syswrite, which bypass Perl's buffering and record seperation routines and go straight down to the bare metal and just read the raw bytes from the appropriate input stream. You have to deal with newlines and so on yourself, but that's what regular expressions are for. Note that sysread will return undef for an error and 0 for end of file (so you can avoid calling eof) - use $! to get the error message or number (see perlvar).

A multiplexing package

This is a short package that demonstrates how to use select for reading from several file handles, and also for timing out the select function. Note that the select timer can be specified to microseconds (as a decimal), although its exact precision depends on your operating system. To take advantage of this, we use the time function from Time::HiRes, available from CPAN - note that is equivalent to the standard time function, except that it returns a decimal value, providing higher precision.

Anyway, here's the package:

# # Simple multiplexing package # # by Andrew Hunter. All rights given away. # package Multiplex; use strict; # De-rigeur use Carp; # Nicer error reporting use Time::HiRes qw/time/; # High precision time # These structures contain the file objects and timers that we are cur +rently # interested in: my @files = (); my @timers = (); # Function to add a file object to the list to listen to # A file object should be a blessed reference, providing the functions # receive(), called when data becomes available, and file(), which sho +uld # return a reference to a filehandle. sub listen ($) { my ($file) = @_; croak "File object must provide receive and file methods" if (!defined($file->can('receive')) || !defined($file->can('file'))); push @files, $file; } # Function to add a timer object to the list to wait for # A timer object should be a blessed reference, providing the function + timeout, # which is called when it expires. # # This function takes two arguments - the timer object and the length +of # time to wait until timing out. sub timeout ($$) { my ($timer, $howlong) = @_; croak "Timer object must provide timeout method" if (!defined($timer->can("timeout"))); push @timers, { what => $timer, when => time()+$howlong }; @timers = sort { $a->{when} <=> $b->{when} } @timers; # Yeah, the sort is probably inefficient with large numbers of tim +ers } # This removes a timeout from the list. This takes a reference to a bl +essed # timer object. It should be the same as the reference passed to timeo +ut. sub removetimout ($) { my ($timer) = @_; @timers = grep { $_->{what} ne "$timer" } @timers; } # Actually do the select business itself! # This should be repeatedly called to create a feeling of interactivit +y sub despatchevents () { my $now = time(); # Send out any timeouts that have expired while ($#timers >= 0 and $timers[0]->{when} < $now) { $timers[0]->{what}->timeout(); shift @timers; $now = time(); } # Set up the file handles to wait for my $rin = ''; vec($rin, fileno($_->file()), 1) = 1 foreach (@files); # Actually do the select my $rout; select($rout=$rin, undef, undef, $#timers>=0?$timers[0]->{when} - $now:undef); # Notify any files that have data waiting foreach (@files) { $_->receive() if (vec($rout, fileno($_->file()), 1)); } } # == return 1;

A TCP acceptor class

To demonstrate the Multiplex class, here is a TCP acceptor. Derive your own objects from it, and override the accepted() method to accept client sockets. Creating a similar object to deal with the client sockets themselves is left as an exercise to the reader (don't forget the importance of only using sysread :-)
# # TCP listener socket # # by Andrew Hunter. All rights given away. # package tcpAccept; use strict; use Carp; use Multiplex; use Socket; # Creates a new object. Call like this: # # tcpAccept->new(port => 5454), where port specifies the port you want + to # listen on sub new { my $proto = shift; my $class = ref($proto) || $proto; my %args = @_; my $self = \%args; bless($self, $class); local *SOCKET; # Filehandle for the socket we're going to +create # Some error checking croak "You must give a port for the socket" if (!defined($self->{port})); # Create a TCP socket socket(SOCKET, PF_INET, SOCK_STREAM, getprotobyname('tcp')) or croak "socket: $!"; # Set the 'REUSEADDR' option setsockopt(SOCKET, SOL_SOCKET, SO_REUSEADDR, pack("l", 1)) or croak "setsockopt: $!"; # Bind to the port specified bind(SOCKET, sockaddr_in($self->{port}, INADDR_ANY)) or croak "bind: $!"; # Listen to the socket listen(SOCKET, SOMAXCONN) or croak "listen: $!"; # Store the socket filehandle away for future reference $self->{_FILE} = *SOCKET; return $self; } # file() function, as defined by the Multiplex module sub file { my ($self) = @_; return $self->{_FILE}; } # receive() function, as defined by the Multiplex module sub receive { my ($self) = @_; my $client; { local(*CLIENT); # The client socket we will create # Accept the connection that is waiting accept(CLIENT, $self->{_FILE}) or die "accept: $!"; $client = *CLIENT; } # Report the accepted socket $self->accepted($client); } # Override this with your own function sub accepted { my ($self, $client) = @_; # Display a silly message and close the socket syswrite $client, "Implement me\n", length("Implement me\n"); close $client; } # == return 1;
For completeness, here is the perl file I used to test these two modules:
package sillyTimer; use strict; use Multiplex; # Example timer class sub new { my $proto = shift; my $class = ref($proto) || $proto; my %args = @_; my $self = \%args; bless($self, $class); } # Print 'bing' every 5 seconds sub timeout { print "Bing!\n"; Multiplex::timeout(sillyTimer->new(), 5); } package Main; use strict; use Multiplex; use tcpAccept; my $acceptor = tcpAccept->new(port => 20000); Multiplex::listen($acceptor); Multiplex::timeout(sillyTimer->new(), 1); Multiplex::timeout(sillyTimer->new(), 1.5); for (;;) { Multiplex::despatchevents() }

Replies are listed 'Best First'.
RE: Reading from more than one socket at once
by splinky (Hermit) on Jul 05, 2000 at 01:09 UTC
    Overall, good stuff. But I think you better doublecheck that "Perl will read a few bytes ahead to improve performance" claim. I don't recall coming across that anywhere else before. In particular, I've used print and <> with sockets before with no problems. The key is to not mix them with sysread and syswrite.

    Also, I think IO::Socket and IO::Select deserve more space. After becoming comfortable with them, I far prefer them over the raw Perl stuff, which looks more like C than Perl. There's a good example of both IO::Socket and IO::Select at the end of perldoc IO::Select. Of course, it should be noted that the major limitation of IO::Select is that it doesn't allow you to wait for reading, writing, and exceptions simultaneously. You have to pick one. But, for the example you show, that's not a limitation.

    Finally, one minor thing: "get all the data it wants before continue" should be "...continuing".

    But, overall, pretty good stuff. Have a Scooby snack on me.


      The 'read a few bytes ahead' stuff is true. I think it's actually done by stdio rather than perl, and you definately see odd effects because of it. It's not usually consistent, so it tends to show up as a heisenbug...

      You won't have a problem if the socket is closed after the client writes its data, because perl (or stdio ;-) will see the EOF and stop reading, but if there's a delay between sending data, a few bytes that were sent before you stopped might only appear at the server after you've restarted (or the socket closes). It's not usually a problem, but I once had some code that failed about 1 in 25 times due to occasionally stopping at the wrong point. stdio does this because reading many bytes can be much faster than reading one at a time, and then future reads can be from memory.

      As to IO::Socket and IO::Select, I agree - I might add a section when I get some time. I went off IO::Socket a while back after (yet more) blocking problems that never got diagnosed. Oddly, *that* only happened when a perl process was communicating with another perl process. Telnet was fine... That code was a mess, though, so it's probably that I messed up somewhere.

      I'll get the typo fixed, too (doh)

      Update: No, I was wrong - the problem is more often caused by write buffers. As these tend to be small, and are often not a factor (optimised out?) when you're communicating between processes on the same machine, people don't notice them, and think that autoflush is all you need to get things working properly. I've seen it fail on Solaris, usually when you get a script to talk to another script (rather than a telnet connection, but not always), and usually just after the connection is established (but not always...). I've never seen the problem on a Linux box, but I haven't played that much with networked ones. Still can't produce code that reliably demonstrates it, though, but the problem usually manifests as one or both scripts blocking indefinately when they should be sending/receiving data.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perltutorial [id://21054]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2017-01-22 18:45 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (189 votes). Check out past polls.