Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

selecting from a number of different input sources.

by jasoncollins (Novice)
on Jul 21, 2010 at 03:54 UTC ( #850549=perlquestion: print w/ replies, xml ) Need Help??
jasoncollins has asked for the wisdom of the Perl Monks concerning the following question:

Hi, My script reads from two text input files, each one of these text files gets updated every once in a while. I am suppose to look for a pattern in these input files and when I find a pattern I am suppose to print it to a socket.

Problem is that if you try to read from a file which has no new data in it, then the io->getline() function will get blocked, and you won't be able to read from the other file even if it is getting populated with valid data.

I have been looking at different solutions, but in every solution I seem to hit a roadblock, so please help!

Solution 1: Use IO::Select, and IO::Handle: In this solution I can select a file descriptor which is ready to be read from using:

@canBeRead = select->can_read(); foreach $elem (@canBeRead) { $elem->getline(); }
from what I have read, it is not a good idea to mix, IO::Select() and IO based getline() functions, as then select gets confused and tells you there is data to be read, when in reality there is no data to be read (because IO has its own buffers), causing all sorts of issues. Is that correct? Has anyone successfully used getline and IO::Select successfully? Note, I don't want to use sysread because I am interested in getting an entire line and not bytes.

Solution 2: I looked in a multi-threaded solution: This was so that I won't have to rely on Select and each thread could maintain its own file handles and won't get blocked. But as it turns out, in Perl threads filehandles are shared across threads by default. So now: - I create a file handles for each input file. - I create two threads to process the data in isolation. - when initiating the function call to the threaded function, I pass the file descriptors as well, so that the same function can work with both file descriptors:

$th1 = threads->create('foo',$firstFD); $th2 = threads->create('foo',$secondFD); sub foo { my $fh = shift; use the $fh to read from the file. }

problem is that file descriptors are are shared by default among all threads! so me passing file descriptors to foo does no good, as foo already has it as a global (and I read that passing file descriptors amongst threads is a bad idea). So I don't know how can I use different file descriptors within foo to read data from two separate files, and have each of those file descriptors not get blocked.

Than you!

Comment on selecting from a number of different input sources.
Select or Download Code
Re: selecting from a number of different input sources.
by Khen1950fx (Canon) on Jul 21, 2010 at 05:49 UTC
    IO::BufferedSelect might work for you. It was designed to work on lines rather than characters or bytes. This worked for me.
    #!/usr/bin/perl use strict; use warnings; use IO::BufferedSelect; my $fh1; my $fh2; my $bs = new IO::BufferedSelect($fh1, $fh2); while(1) { my @canBeRead = $bs->read_line(); foreach (@canBeRead) { my ($fh, $line) = @$_; my $fh_name = ($fh == $fh1 ? "fh1" : "fh2"); print "fh_name: $line\n"; } }
      ... This worked for me.

      Really?  It doesn't for me — at least not with the file handles being opened to regular text files.

      Buffering is only one of the problems here. The deeper issue is that with respect to select, regular disk files are defined to always be readable (even though the operation might block for a few milliseconds until the read head of the disk is positioned). In other words, if the file pointer is at the end of the file, select would still indicate that the file is readable. And because ->getline() reads until the next newline or eof, you could get partial lines (not terminated by a newline).

      So, while the following sort of works, it doesn't fulfill the OP's requirement to read entire lines.

      #!/usr/bin/perl use strict; use warnings; use IO::Handle; use IO::Select; my ($fname1, $fname2) = qw(1.txt 2.txt); open my $fh1, $fname1 or die $!; open my $fh2, $fname2 or die $!; my $slct = IO::Select->new(); $slct->add($fh1, $fh2); while (1) { my @canBeRead = $slct->can_read(); foreach my $fh (@canBeRead) { my $line = $fh->getline(); if (defined $line) { my $fname = $fh == $fh1 ? $fname1 : $fname2; print "$fname: $line\n" } else { sleep 1; # be nice } } }

      It's effectively the same as

      use IO::Handle; my ($fname1, $fname2) = qw(1.txt 2.txt); open my $fh1, $fname1 or die $!; open my $fh2, $fname2 or die $!; while (1) { foreach my $fh ($fh1, $fh2) { my $line = $fh->getline(); if (defined $line) { my $fname = $fh == $fh1 ? $fname1 : $fname2; print "$fname: $line\n" } else { sleep 1; # be nice } } }
Re: selecting from a number of different input sources.
by perreal (Monk) on Jul 21, 2010 at 11:56 UTC
    This is not what you are asking but, can't you use sockets instead of files? Also, if you are on Linux you can check out http://search.cpan.org/dist/Linux-Inotify2/.

      This is how I am opening up the file to read, not sure what do I need to do to read the text files as a socket instead, and how that would help.

      my $io = new IO::Handle; open (my $fh, '<', $FILE) or die $!; if ($io->fdopen(fileno($fh), "r")) { ...

        It's not immediately clear from the docs, but just saying use IO::Handle entitles you to call IO::Handle methods on normal file handles like your already opened $fh.  In other words, your snippet could be simplified as follows:

        use IO::Handle; open (my $fh, '<', $FILE) or die $!; # then simply call method from IO::Handle $fh->getline();

        This doesn't answer the question how to make a socket out of a regular file, but I thought it's worth mentioning nevertheless...

Re: selecting from a number of different input sources.
by ikegami (Pope) on Jul 21, 2010 at 14:55 UTC

    Problem is that if you try to read from a file which has no new data in it, then the io->getline() function will get blocked

    That's not true. It will return end-of-file. Or are you reading from something other than a plain file?

    See File::Tail

      The file that I am readnig from is constantly being written to by some other source, so its not a static file. At times getline will block when there is no new data, and at times it won't when there is new data.

        At times getline will block when there is no new data

        I'll repeat: No, it won't.

        $ perl -le'$|=1; for (;;) { print "poke"; sleep 1; }' > file & perl -M +IO::Handle -e'sleep 3; print while $_ = STDIN->getline; print "exited + instead of blocking\n"' < file ; fg [1] 29483 poke poke poke poke exited instead of blocking perl -le'$|=1; for (;;) { print "poke"; sleep 1; }' > file ^C $ cat file ; rm file poke poke poke poke poke poke
Re: selecting from a number of different input sources.
by jasoncollins (Novice) on Jul 21, 2010 at 17:03 UTC

    Thanks for the feedback, but anyone know a working solution to this problem.

      File::Tail won't work with select, so that only leaves one of the options your provided.
Re: selecting from a number of different input sources.
by ikegami (Pope) on Jul 21, 2010 at 17:53 UTC

    when initiating the function call to the threaded function, I pass the file descriptors as well,

    You shouldn't have to work with file descriptors. Pass the file name to the thread.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://850549]
Approved by ahmad
Front-paged by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (8)
As of 2014-12-25 02:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (159 votes), past polls