Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Using select and IO::Select

by duff (Parson)
on Jul 04, 2004 at 16:49 UTC ( [id://371720]=perltutorial: print w/replies, xml ) Need Help??

Here's a tutorial I started on using select (primarily because I saw a query from tachyon on it and I figured it was a good thing to do). This is just a first cut, so be kind and send me your comments. :-)

Also, is there a way for perlmonks.org to automagically handle POD format?

=head1 NAME perlselecttut - a tutorial on using select() and IO::Select =head1 DESCRIPTION This document attempts to explain how to use the 4-arg version of C<select()> and the module C<IO::Select> so that they are easier to understand. If anything is unclear, the author of this document would appreciate hearing about it. =head2 What are C<select()>/C<IO::Select> and why would you use them? Without going into too much detail (none at all in fact) C<select()> a +nd C<IO::Select> are useful for when you want to "watch" several filehandles at once to see if they are ready for reading or writing or if any of them have an exception condition. Now, when I say "filehandles" I really mean anything that can be read or written to as if it were a file: sockets, serial devices, pipes, fifos, etc. An example application that would use C<select> is a chat server. Each client that is connected to the server occupies it's own socket (filehandle), so the server would use C<select> to see what each clien +t says having to wait on any particular client. Caveat lector! This document is mostly written from a unix-centric point of view, but similar ideas apply on other operating systems. =head2 All about C<select()> Here's the canonical C<select()> example that you will find in the documentation: ($nfound,$timeleft) = select($rout=$rin, $wout=$win, $eout=$ein, $timeout); But what does it all mean? Why the assignments? Why those particular variable names? The first 3 arguments to C<select()> tell it which filehandles to watc +h for reading, writing, and exceptions respectively (thus the leading r, w, and e on each variable). These arguments are each a special bitvector where each "on" bit corresponds to the filedescriptor number of the filehandle that is being watched. The fourth argument to C<select()> is how long to watch the filehandles before giving up. Th +e C<select()> call returns a count of how many filehandles have triggere +d one of the reading/writing/exception conditions (C<$nfound>) and the amount of time left in the timeout specified (C<$timeleft>). =head3 What are filedescriptors? Each time you open a file, the operating system makes an entry in something called a file descriptor table so that it can keep track of the file (things like the current position, whether or not it's buffered, the contents of the buffer, etc.) By default the entry at index 0 is STDIN, at index 1 is STDOUT and STDERR is at index 2. The index into the filedescriptor table is called the file descriptor numb +er or L<fileno> for short. It's these indexes that are used as offsets in +to the bitvector to identify to C<select()> which filehandles you wish to watch. =head3 Using C<vec()> to build bitvectors A bitvector is an odd thing for perl because it's so low level. Having to fiddle with bitvectors shows how close C<select()> is to its C heri +tage. But luckily perl remembers its heritage through a function called C<vec()>. To create a bitvector for use with C<select()> with the proper bit turned on for STDIN you would write: vec($vector,fileno(STDIN),1) = 1; C<fileno()> is a routine that returns the index into the filedescriptor table for a given filehandle (aka. the fileno). Since we know STDIN is at index 0, we could have written vec($vector,0,1) = 1; But that would be I<extremely> unportable. If, for some reason, we happened to run on a system that used 0 for some other filehandle out program wouldn't work as expected. This I<could> happen if we closed STDIN, then opened another file. The operating system may use the firs +t available free filedescriptor when you open a new file and since we closed STDIN, descriptor 0 would be free. Anyway, the call to C<vec()> means: treat $vector is a bitvector and access the bits starting at the bit position corresponding to the the index into the filedescriptor table for the filehandle STDIN for 1 bit and assign that bit the value 1. So, to watch several filehandles you would do this: vec($vector,fileno(FOO),1) = 1; vec($vector,fileno(BAR),1) = 1; vec($vector,fileno(BAZ),1) = 1; and then use C<$vector> in the call to C<select()> to watch the FOO, BAR, and BAX filehandles. See C<perldoc -f vec> for more information on C<vec()> =head3 Why are we doing assignments in the C<select()> call? The reason you typically see assignments in the first three positions +is that C<select()> modifies its first 3 arguments to tell you I<which> filehandles have data ready for reading or writing or exceptions and y +ou usually want to continually watch the same filehandles over and over again. Since the following select($rout=$rin, $wout=$win, $eout=$ein, $timeout); functions the same as $rout = $rin; $wout = $win; $eout = $ein; select($rout, $wout, $eout, $timeout); The assignments allow $rin,$win, and $ein to keep their original value +s so that you can call it in a loop without having to continually build the bit vectors. Contrast these two functionally equivalent snippets: # Example 1, the usual idiom vec($rin,fileno(FOO),1) = 1; vec($rin,fileno(BAR),1) = 1; vec($rin,fileno(BAZ),1) = 1; while (1) { ($found) = select($rout=$rin,undef,undef,$timeout); next unless $found; # Check $rout to see which handles are ready for reading } # Example 2, building the vectors each time while (1) { vec($rinout,fileno(FOO),1) = 1; vec($rinout,fileno(BAR),1) = 1; vec($rinout,fileno(BAZ),1) = 1; ($found) = select($rinout,undef,undef,$timeout); next unless $found; # Check $rinout to see which handles are ready for reading } Oh, btw, you'll notice that I used C<undef> for the second and third argument to C<select()>. If you don't care to check any filehandles for reading, writing, or exceptions you can pass C<undef> in the respective position and C<select()> won't bother paying attention to that condition. You can also pass c<undef> for the timeout and C<select()> will wait forever for a filehandle to trigger the appropriate condition. =head3 Checking which filehandles are ready After C<select()> returns, the three bitvectors will have changed to reflect the actual filehandles that triggered the particular condition you were waiting for. One way to check which filehandle is ready is to just use C<vec()> again to see if the particular bit is 1. For example: vec($rin,fileno(FH1),1) = 1; vec($rin,fileno(FH2),1) = 1; vec($rin,fileno(FH3),1) = 1; while (1) { ($found) = select($rout=$rin,undef,undef,$timeout); next unless $found; if (vec($rout,fileno(FH1),1) == 1) { # There is data waiting to be read on FH1 } if (vec($rout,fileno(FH2),1) == 1) { # There is data waiting to be read on FH2 } # and so on ... } Another method would be to use C<select()> again. vec($fh1,fileno(FH1),1) = 1; vec($fh2,fileno(FH2),1) = 1; vec($fh3,fileno(FH3),1) = 1; $rin = $fh1 | $fh2 | $fh3; while (1) { ($found) = select($rout=$rin,undef,undef,$timeout); next unless $found; if (select($fh1,undef,undef,$timeout)) { # There is data waiting to be read on FH1 } if (select($fh2,undef,undef,$timeout)) { # There is data waiting to be read on FH2 } # and so on ... } By building individual bitvectors for each filehandle and then combini +ng them together using a bit-wise OR, we can check whether I<any> of the filehandles are ready with the combined bitvector or whether an individual filahandle is ready with the individual bitvectors using C<select()>. Note that several filehandles may be ready at once, so it would be prudent to service as many of the filehandles that you can before calling C<select()> again. =head3 I know which filehandles are ready. Now what? After you have setup C<select()> and determined which filehandles Are ready, you'll want to read from those filehandles that are ready for reading and write to those filehandles that are ready for writing and +do whatever is necessary to those filehandles that have an exception condition. [ To be honest, I've never used the ability to check for exception conditions on filehandles and I have little understanding of what it may be for. The only reference I have handy at the moment, Stevens' U<Advanced Programming in the Unix Environment>, says "... an exception condition corresponds to (a) the arrival of out-of-band data on a network connection, or (b) certain conditions occuring on a psued +o terminal that has been placed into packet mode" ] But you must be careful how you read or write data to the filehandle. Buffered I/O lik +e C<readline()> (aka, the diamond operator or <>) or C<read()>, won't wo +rk quite right, so you need to use C<sysread()> and C<syswrite()> to read/write from/to the appropriate filehandles. =head2 IO::Select As you can tell by now, C<select()> isn't the friendliest of routines to use. Luckily you have another option: C<IO::Select>. C<IO::Select +> is an object oriented interface that sits on top of the basic C<select()> routine such that you never have to see bitvectors and strange assignments. You deal only with C<IO::Select> objects and the filehandles themselves. Here's a simple example: use IO::Select; my $sel = IO::Select=>new; $sel->add(\*FOO); $sel->add(\*BAR); $sel->add(\*BAZ); if (@fh = $sel->can_read($timeout)) { # Each filehandle in @fh is ready to be read from } The basic usage is simple: you create a IO::Select object (possibly initializing it with filehandles), then add new file handles to the object using the C<add()> method, and when you're ready to "watch" the filehandles you call one of the C<can_read()>, C<can_write()>, or C<has_exception()> methods on the object. Each of these methods returns an array of filehandles such that you can read/write from/to them. =head1 AUTHOR Jonathan Scott Duff duff@cpan.org

Replies are listed 'Best First'.
Re: Using select and IO::Select
by Zaxo (Archbishop) on Jul 04, 2004 at 20:15 UTC

    duff++, good job and much needed.

    One of the great things about select is its efficiency. Instead of busying around a polling loop, select lets your process go to sleep until there is something to do. That does not speed up the process where select is, but rather frees its time slice for all the other processes you and other users may run. It's the neighborly thing to do, and good for throughput.

    There are some more questions I think could be covered. I don't necessarily know the answers.

    • I/O methods: When can Perl's buffered I/O methods be used with select? The perldoc warns that sysread (and by extension, syswrite) is necessary, except as provided by POSIX. What does POSIX permit?
    • Signal handling: Do signals awake a sleeping select? Does a select timeout affect a pending alarm?

      (Added) This it readily checked with a few one-liners.

      $ perl -e'alarm 1;printf "Num: %d\tTime left: %f\n", select undef, und +ef, undef, 3.0' Alarm clock $
      shows that setting timeout in select does not interfere with SIGALRM and that signals will awake pending select.
      $ time perl -e'alarm 5;printf "Num: %d\tTime left: %f\n", select undef +, undef, undef, 3.0' Num: 0 Time left: 0.000000 $
      shows that having an alarm set does not interfere with select timing.
      $ perl -e'$SIG{ALRM}=sub {};alarm 1;printf "Num: %d\tTime left: %f\n", + select undef, undef, undef, 3.0' Num: -1 Time left: 2.000000 $
      shows that catching a signal will jolt select into returning with -1 in the number slot. On Linux the time left value would be useful in graceful recovery from such interruptions.

    • Return values: What's a good use for the number of ready channels? What systems return something useful for the time remaining? Linux does, are there others?

      (Added) Truth or not of the number tells whether the return from select was due to a timeout. As we saw, -1 means return due to interruption by catching signal. The number can be decremented with each channel handled to enable a quick test for completion. The timeleft value appears to be useful only on Linux. $ perl -e'printf "OS: %s\tNum: %d\tTime left: %f\n", $^O, select undef, undef, undef, 1.5' gives for several systems,
      OS: linux Num: 0 Time left: 0.000000 (Zaxo)
      OS: freebsd Num: 0 Time left: 1.500000 (sporty)
      OS: solaris Num: 0 Time left: 1.500000 (sporty)

      Thanks to sporty for his assistance with that.

    • IO::Select:: Does the very welcome sugar coating relax the restrictions on I/O methods?
    • IO::Select: Does IO::Select return IO::* objects or just whatever globlike things you add?
    • IO::Select: A more complicated example would be welcome. Something with error handling would be good.
    • Applications: What is a good way to get per-handle switching of I/O? That is needed for many sorts of servers, where responses to requests need to be directed to a particular channel.
    • Applications: Adding and removing handles on the fly. This is needed for servers which open new sockets in response to requests to a well-known port.

    Again, this is a very good job, and welcome information. You can run your pod through pod2html to get a good format for posting here.

    After Compline,
    Zaxo

      Signals are unsafe - ie they basically interrupt anything with no regard to whether it is a good idea/time or not. See Re: Dormus interruptus if you want to test it but they interrupt selects before timeout expiry.

      cheers

      tachyon

      Thanks and thanks. If you do know any of the answers to your questions you could send them to me and make my life slightly easier :) I plan on updating the tutorial over the coming week by addressing the questions you raise (and any others I can think of).

Re: Using select and IO::Select
by tachyon (Chancellor) on Jul 05, 2004 at 00:20 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perltutorial [id://371720]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-03-19 07:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found