Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Trying to do multiple while loops on the same input file

by elef (Friar)
on Jun 01, 2011 at 13:31 UTC ( [id://907627]=perlquestion: print w/replies, xml ) Need Help??

elef has asked for the wisdom of the Perl Monks concerning the following question:

Perl can't seem to do two while loops on the same read-only filehandle, as I painfully found out after spending upwards of three hours troubleshooting a long and fairly complex script.
The problem can be condensed to this:
open (INPUTFILE, "<:encoding(UTF-8)", "in.txt") or die "Can't open fil +e: $!"; for (my $i = 1; $i < 5; $i++) { print "\n-----------------------LOOP $i-----------------------\n"; while ($line = <INPUTFILE>) { print "line $.: $line\n"; } } close INPUTFILE;

This prints the contents of the file in loop one as expected, but then it prints an empty loop 2, 3 and 4. I have to close and reopen the filehandle in each loop to make it work:
for (my $i = 1; $i < 5; $i++) { open (INPUTFILE, "<:encoding(UTF-8)", "in.txt") or die "Can't open + file: $!"; print "\n-----------------------LOOP $i-----------------------\n"; while ($line = <INPUTFILE>) { print "line $.: $line\n"; } close INPUTFILE; }

As a point of interest, if I just reopen the fh in each loop without closing it (i.e. move the close out of the loop) then the program still works but the line numbers are not reset after each loop.

Is this behaviour intentional? Should I have known this to begin with? Does this behaviour serve any purpose or have any benefit? It certainly seems broken to me.
Even this simple code doesn't work as I would expect it to:
print "First loop:\n\n"; while (<DATA>) { print; } print "\n\nSecond loop:\n\n"; while (<DATA>) { print; } __DATA__ first line second line third line


I'm on Win7 with perl 5.10.0, by the way. (I'm using an old version because ppm is broken in new versions.) I also tested this on 5.10.1 on Ubuntu and got the same result.

Replies are listed 'Best First'.
Re: Trying to do multiple while loops on the same input file
by zek152 (Pilgrim) on Jun 01, 2011 at 13:51 UTC

    Is this behavior intentional? Yes. In Perl files work about the same way they work in most languages. You open a file and then you read from start to end. Once you are at the end you can't read anymore unless you perform a seek.

    Should I have known this to begin with? Hard to say. Let's just say that it is good that you know it now.

    Does this behavior serve any purpose or have any benefit? Yes. It encourages "one-pass" processing of files. In other words it is more efficient to only read a file once (start to end). Structuring your code so that you don't need to make multiple passes through a file will make your program consume less resources and will most likely make your program more efficient.

      It encourages "one-pass" processing of files. In other words it is more efficient to only read a file once (start to end). Structuring your code so that you don't need to make multiple passes through a file will make your program consume less resources and will most likely make your program more efficient.

      Thanks. That makes some sense, but, honestly, I'd rather have perl treat coders as adults who can decide for themselves how many times they want to read their files. In this case, the file is about 50kB long. Even supposing that it's not left in memory between two reads, it can be read from my SSD again in a matter of milliseconds. Not a lot of time compared to the hours I spent looking to find out what's wrong, I think you'll agree.
      On a more general note, I'd expect the TTMWTDI ethos to extend to allowing such "dumb" multiple reads on a filehandle, maybe throwing a warning if strictures are on. As it is, perl just fails to execute the second while loop without throwing any warning whatsoever, and whichever way you look at it, that's not very coder-friendly.

        It is much less a Perl way of doing things and more of an OS way of doing things. Perl gives you the seek function to do exactly what you want. Operating systems are made to handle 1kB files and also handle 3gB files. For your particular problem everything can be stored in ram (and possibly in a cache). For other problems the whole file cannot be stored in RAM.

        I am sorry that you spent hours looking for the issue. Trust me when I say that learning how file reading works will help you no matter what language you are using. I personally do not know of a language that provides the functionality that you desire in the basic read(file).I can tell you that C, C++, C#, Java, Perl and Python all read files in a start to finish manner.

        Update: fixed small typo.

        Well, I think I have to agree with Perl on this one. Its behavior is consistent. Once you reach the end of the file, "$line = <INPUTFILE>" is false. False ending the first loop, and still false at the beginning of any additional loops. It doesn't fail anything, it just doesn't change anything magically between loops either.

        It sounds like you would like it to be false, and then next loop reset back to the beginning of the file? That does not seem consistent to me. You'll note that you can do anything you like with INPUTFILE inside your while loop, and what would you expect the while() to do then?

        There is no explicit or implied relationship between your loop and what your loop does internal to Perl. Nor should there be imo. That's entirely for your code to establish.

        --Dave

Re: Trying to do multiple while loops on the same input file
by LanX (Saint) on Jun 01, 2011 at 13:47 UTC
    I think you are looking for the seek command.

    Cheers Rolf

Re: Trying to do multiple while loops on the same input file
by bart (Canon) on Jun 01, 2011 at 15:11 UTC
    Even this simple code doesn't work as I would expect it to
    This means that your expectations are wrong.

    A file handle is pretty much an iterator. That implies that it has an internal state. Every time you read from it, from anywhere, does the same thing: it gets the next line from the file, complying to the current internal state, and adjusting it. Yes that means you can read one line from the file in one place in your code, and the next line in another place. The state is in the filehandle, not in the code.

    So, first you read everything that is in the file, making the filehandle's internal state point to the end of the file, and then you expect to read even more from the file, and it should, magically, start again from the top??

    If perl did that, that would likely be a bit too much of DWIM magic, with a lot of bugs as a result, when you don't want this behavior.

    What you need is either to manually reset the filehandle's internal state to start again from the top, as several others have pointed out, using seek; or you must make a clone of the filehandle (so the clone gets its own internal state, a copy from the original) before you read anything from it. For example, you can duplicate the filehandle like this:

    open SECONDINSTANCE, "<&INPUTFILE";
    but it must also be possible to copy a filehandle into a new one with F_DUPFD. (It's not exactly crystal clear to me how, and I cannot test right now. I'll revisit this node later with an update, unless someone beats me to it first.)

    But, you can start by checking out the docs.

Re: Trying to do multiple while loops on the same input file
by runrig (Abbot) on Jun 01, 2011 at 15:14 UTC
    Why would having two loops cause the file handle position to be reset? I'd be upset if it did, because sometimes it makes sense to read a file in more than one loop and do something like:
    while (<$fh>) { ..read/process file header last if /End-Of-Header/; } while (<$fh>) { ...read/process file body }
Re: Trying to do multiple while loops on the same input file
by thezip (Vicar) on Jun 01, 2011 at 16:35 UTC

    This doesn't answer to your original problem of reading a file with multiple while loops, but it does present another WTDI that, to me, seems much more natural.

    Since your file is really small, read it in its entirety into and array of lines, a la:
    use strict; use warnings; use autodie; my $filename = 'filename.txt'; open my $ifh, '<', $filename; my @lines = <$ifh>; close $ifh;

    Now you can process the content of you file to your heart's content by simply maintaining the relevant indices for the lines you want to be dealing with, and iterating via "for" loops. You can do this as many times as you want without having to worry about iterating via <>...


    Updated: Added blurb re: for-loop iteration


    What can be asserted without proof can be dismissed without proof. - Christopher Hitchens
      Good suggestion. In this particular case, I just moved the open and close commands inside the loop, but in other situations, reading the file into an array would make more sense.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://907627]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-03-19 08:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found