http://www.perlmonks.org?node_id=985317

w1r3d has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys,

I've been trying to figure out how select->can_read works, and if it's even what's causing my frustration.

The situation:

I currently have a cron job that runs nightly on a bunch of machines and sends the results via syslog to a central log server. On the log server, there's a script (script1.pl) that reads the syslog messages file and looks for certain keywords, then copies the messages out to different log files. For this example, I'll call one of those log files sub.log.

There's yet another script (script2.pl) that tails sub.log and all the other logs written by script1.pl, again looking for more specific keywords. If it finds a match, then it writes the message out to another log file (reduced.log).

The problem I'm seeing is that while everything gets successfully copied by script1.pl to sub.log, not everything gets copied to reduced.log - at least not right away.

The way script1.pl works is by doing:

open(TAIL, "tail -f /var/log/messages |") or die "etc"; while (<TAIL>) { if(/type1/) { print SUBLOG1 $_; } if(/type2/) { print SUBLOG2 $_; } # etc }
script2.pl, however, works by doing something like:
my $sel = IO::Select->new(); for(iterate through obj array with filenames) { open($handle, "tail -f $obj_file |"); $sel->add($handle); } while(@ready = $sel->can_read) { foreach my $fh (@ready) { $line = <$fh>; if($line =~ /keyword/) { print REDUCEDLOG $line; } } }

What ends up happening is that let's say that 15 lines get written to sub.log (I can verify this by tailing it from the terminal) and 5 of those match the keyword I'm looking for, only two of the lines will actually get written to reduced.log (the number "two" is made up, I haven't figured out a pattern yet). If I run a test script to write 15 additional lines to sub.log, now I will get the remaining 3 lines from the previous run in reduced.log, plus *some* from the latest run. So, my reduced.log file is never up to date with the data that's in sub.log, and if for whatever reason the script dies, then I "lose" that data.

At first I thought the problem had to do with the flushing of the buffer so I added autoflush to pretty much all the filehandles, to no avail. It seems as if the select is getting all the data, I'm just not processing it correctly or something :/

Thanks in advance!

Pedro

Replies are listed 'Best First'.
Re: Stumped with select->can_read (buffered)
by tye (Sage) on Aug 03, 2012 at 19:17 UTC
    while(@ready = $sel->can_read) #... $line = <$fh>;

    Don't mix buffered I/O with select. can_read() doesn't return a handle because there is nothing currently left to read from the file handle. But <$fh> left plenty of stuff to be read in the buffer.

    - tye        

      That makes sense. Would I have to use read() in this case?

      I tried doing:

      while(@ready = $sel->can_read) { foreach my $fh(@ready) { my $line = ""; my $buf = ""; while(read($fh, $buf, 1024)) { $line .= $buf; } } }

      Didn't work for me, maybe because LENGTH (=1024) is too big? Sorry, I'm not familiar with read(), as you can see :/

      EDIT: nvm. I'm reading the link you posted now. Will report back if I still can't figure it out. Thanks!

        It's best to use sysread, but you will have to figure out a way to detect line endings. Maybe after concating the sysread data into a temp buffer then pull off lines with a regex or split. Read perldoc -f sysread. It will read as much as it can in a non-blocking manner. This will try to read in 1024 byte chunks, but will read less if that is all that is there.
        while(@ready = $sel->can_read) { foreach my $fh(@ready) { my $line = ""; my $buf = ""; # while(read($fh, $buf, 1024)) while( my $bytes_read = sysread( $fh, $buf, 1024 ) > 0 ) ) { print "$bytes_read\n"; { $line .= $buf; } } }

        I'm not really a human, but I play one on earth.
        Old Perl Programmer Haiku ................... flash japh
Re: Stumped with select->can_read (example)
by tye (Sage) on Aug 04, 2012 at 17:02 UTC

    For example:

    my $sel= ...; my $bytes= 16*1024; my %buf; my @ready; while( @ready= $sel->can_read() ) { foreach my $fh ( @ready ) { for my $buf ( $buf{$fh} ) { $buf = '' if ! defined $buf; my $eof= ! sysread( $fh, $buf, $bytes, length($buf) ); while( $buf =~ s/^(.*\n)// || $eof && $buf =~ s/^(.+)$// ) { my $line = $1; if( $line =~ /keyword/ ) { print REDUCEDLOG $line; } } } } }

    - tye        

      Wow, awesome! I implemented that into my code, and that seems to work great! Thanks!

      I do have a question, though. I'm not sure what the line: "for my $buf ( $buf{$fh} )" is doing. Is the hash "%buf" getting initialized as the $buf variable gets populated with the sysread calls? That's the only line in the code that I'm struggling to understand. :/

      Thanks again!

      Pedro

        It just makes $buf an alias for $buf{$fh} so I don't have to type the {$fh} part over and over.

        - tye