Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

reading several lines in a gulp

by John M. Dlugosz (Monsignor)
on Apr 27, 2011 at 11:34 UTC ( [id://901546]=perlquestion: print w/replies, xml ) Need Help??

John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

$line = <$file>;
will read one line, and in list context
@lines = <$file>;
will read all of them, even if the file is terabytes in size. Is there a simple and efficient way to read "several" lines, passing a limit to the number it reads at a time?

Replies are listed 'Best First'.
Re: reading several lines in a gulp
by moritz (Cardinal) on Apr 27, 2011 at 11:37 UTC
    my @lines; push @lines, scalar <$file> for 1..10;

    No need for a primitive for something that can be easily achieved with existing primitives, and isn't used all that often.

    Perl buffers the data it reads from the file, so it shouldn't be much less efficient than slurping a small file in list context.

      If the file is not a multiple of 10 lines, does that push extra undef's into the @lines?
        Yes, I didn't think of that. Maybe this would be better:
        sub gulp { my ($file, $count) = @_; my @lines; for (1..$count) { push @lines, scalar <$file>; last if eof $file; } return @lines; }

      Why is scalar used in that? What's the effect? I'm staring crosseyed at this very useful solution, can someone decompress this one for me?

      Thanks ahead of time for all you experts and your great solutions...

      --Ray

        Why is scalar used in that? What's the effect?

        push imposes list context (you can push more than one element in one statement), and <$file> would read the whole file at once in list context (as pointed out in the OP).

        In other words, without scalar, the whole file would be read in the first iteration, which would kind of defeat the purpose of the exercise...

Re: reading several lines in a gulp
by Anonymous Monk on Apr 27, 2011 at 11:46 UTC

      Same problem with moritz's first solution. Needs to handle eof conditions:

      @lines = map {eof($file) ? () : scalar <$file>}, 1..10;
        In that case, defined-or operator imposes scalar context
        @lines = map { <$file> // () } 1 .. 10;
Re: reading several lines in a gulp
by anonymized user 468275 (Curate) on Apr 28, 2011 at 16:37 UTC
    If it's unix, for max perf. I'd seek to successive (4096 byte) buffered IO page boundaries per outer iteration, transfer the lines contained to an array, process those in an inner iteration and carry over the last incomplete line (if no \n) to the next IO page iteration.

    One world, one people

        Yes, but if you read a whole file into an array, in spite of such optimisation being reasonable at that point, Perl won't reorganise your code to minimise memory usage, nor does it provide hooks to insert your code per iteration of such optimisation.

        One world, one people

Re: reading several lines in a gulp
by JavaFan (Canon) on May 01, 2011 at 14:27 UTC
    my $several = ...; my @lines; push @lines, $_ while @lines < $several && defined($_ = <$file>);
      Interesting, by using the count-up of the scalar @lines instead of counting down of $count it is naturally proof against funny (negative) values of $count and works correctly for an initial value of zero.

      If scalar @lines is efficient and "free" (it knows the size anyway) it might even be faster than incrementing $count.

Re: reading several lines in a gulp
by LanX (Saint) on Apr 29, 2011 at 17:31 UTC
    $. and eof should help
    my $count=3; while (<DATA>) { next if ($.-1) % $count and ! eof; print @gulp; print "-----\n"; @gulp=(); } continue { push @gulp,$_ } __DATA__ a b c d e f g h

    I'm sure there are more elegant solutions...

    Cheers Rolf

      > I'm sure there are more elegant solutions...

      indeed, iterators are easier to maintain!

      sub readlines { my ($fh, $count) = @_; my @gulp; push @gulp, scalar <$fh> while $count-- and ! eof $fh; return @gulp; } while ( @lines = readlines(DATA,3) ) { print @lines,"----\n"; } __DATA__ a b c d e

      prints

      a b c ---- d e ----

      alternative iterator:

      sub readlines { my ($fh, $count) = @_; my @gulp; while (<$fh>) { push @gulp,$_; last unless --$count; } return @gulp; }

      Cheers Rolf

      UPDATE: Handling of edge cases like missing $count parameter could be added to the iterators:    $count=1 unless $count;

      Maybe $count<=0 should be handled differently...

    • 0 => no iteration
    • -1 => slurp whole file
    • -2 .. => warning
        I like it! Thanks.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://901546]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2024-03-28 14:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found