Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Reading files n lines a time

by naturalsciences (Beadle)
on Dec 06, 2012 at 13:05 UTC ( #1007560=perlquestion: print w/ replies, xml ) Need Help??
naturalsciences has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I am working on multi GB text files. For now Perl has been good for working them line by line. Using the

while($line=<>){do_stuff_with_it_and_print_to_new_file}

Now I am in a need to read in files four lines at a time (as I need to do some comparisions in those quartets, but you can imagine that reading files in n lines at time would be helpful in several applications, etc. flat file with name\naddress\nbilling\nname\naddress\nbilling etc. for triplets.)

I was able to fulfill my desires for duplets easily with

 while($line=<>){$nextline=<>;do_stuff_to_those_poor_two_lines}, but an application of while($line=<>){$nextline=<>;$thirdline=<>} wont work anymore.

So how could I get on to reading and manipulationg my text files by any number of lines at a time I wish?

edit: I myself am thinking of somekind of while loop based thingy with possibly seek to keep those loops moving on in the filehandle. etc.

Comment on Reading files n lines a time
Select or Download Code
Re: Reading files n lines a time
by choroba (Abbot) on Dec 06, 2012 at 13:16 UTC
    wont work anymore
    What happens? What error message do you get?

    I experimented with reading multiple lines through map, this works for me, but is a bit ugly. Maybe someone else knows a cleaner way?

    while(( my @lines = map $_ = <>, 1 .. 4 )[0]) { print @lines; print "\n"; }
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Hmm looked through my test script and yes it seems that the while(<>){$nextline=<>;$thirdline=<>} can actually be extended indefinetly. It is ugly but gets the job done.

      Why my scripts crashed seems to be of the undefined variables you get when your text files line number is not divisible by the number of your $nextlines. Then in the end you will get undefined variables for your $nextlines

      I need to get some control element that would terminate the script (or loop if I want to continiue with script) nicely. I think akin to - if any variables undefined exit loop, in the beginning of the loop. Should stop it from panicking with unfavorable Eof situation.

        More on the undefined variables problem. Would adding something like this do a loop be correct.
        while ($line0=<>) {$line1=<>;$line2=<>;$line3=<>; last if not defined $line0; last if not defined $line1; last if not defined $line2; last if not defined $line3; do_some_stuff; }

        Could I use some OR statements to get these last if-s on a single line. Or can or and and statements used only between two values. I was thinking like

         last if not defined $line0or$line1or$line2or$line3

        or

         last if not defined $line0||$line1||$line2||$line3
        Using
        while(( my @lines = map $_ = <>, 1 .. 4 )[-1]) {
        should stop if the last line is not read as well. (You might need to add defined.)
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Perhaps marginally nicer, the kite operator:

      while(( my @lines = map ~~<>, 1 .. 4 )[0]) { print @lines; print "\n"; }
      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

        Considering the OP deals with multi-GB files I'd prefer the slightly uglier assignment to $_ because the kite's tail is not optimized away so all the strings would actually be shoved through the binary negation twice.

        I was surprised by the result BTW when I looked at the optree; I'm almost completely clueless about what Perl can and cannot optimize but it even a fairly trivial peephole optimizer as in early C compilers could catch this.

      Don't you need to check for defined rather than true for the first element of your array in the while loop?

      --DrWhy

      "If God had meant for us to think for ourselves he would have given us brains. Oh, wait..."

        Probably yes. I indicated that deeper in a reply.
        لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Reading files n lines a time
by ww (Bishop) on Dec 06, 2012 at 13:28 UTC
    Depending on the way your files-which-need-to-be-read-four-lines-at-a-time (henceforward, "fwn") are formatted/organized and the consistency thereof, you might consider setting your $/ record separator...

    for ex, if data looks like:

    name date bill name1 date1 bill1 ...

    then setting local $/="\n\n" will tell Perl that you want to read a paragraph from the fwn, where paragraph is defined as something ending in two consecutive newlines. Better yet, the special case, $/="" defines para somewhat more broadly and may be suitable to deal with your data.

    You'll find many examples here, if you merely SuperSearch for $/

      Thanx. I already know about the record operator. My fwn-s unfortunately don't contain anything else as useful as newline to determine useful blocks. At least to my senses.
        Perhaps you can post a real (or baudlerized sample) snippet of your actual data. It's amazing what a bit of exposure to regular expressions can help one spot, and here, you'll have many such well-educated eyes looking for proxy-para-markers.

        I'm actually surprised -- no, very surprised -- that this request hasn't been posted higher in the thread.

Re: Reading files n lines a time
by MidLifeXis (Prior) on Dec 06, 2012 at 13:40 UTC
    sub readNLines { my $lines_to_read = shift or die '....'; my @lines_read; while(<>) { push( @lines_read, $_ ); last if @lines_read == $lines_to_read; } if ( @lines_read != $lines_to_read ) { # error condition } return @lines_read; } while (@lines = readNLines(4)) { do_stuff() }

    I might even consider the question "What am I reading from the file?" - if it is a record, then I might instead change readNLines to readRecord (as well as the surrounding usage of the data). Then the concept is abstract, and if the format of a record changes, you modify the reading piece. In my opinion,

    while ( my $record = readRecord( ... ) ) { ... }
    tells me much more in the context that I am concerned with than
    while ( $line1=<> && $line2=<> && $line3=<>... ) { ... }

    Just my $0.02.

    --MidLifeXis

Re: Reading files n lines a time
by LanX (Canon) on Dec 06, 2012 at 13:50 UTC
      Ahh that seems nice and general. A bit more refined method than using fourteen $nextlines would be :D
Re: Reading files n lines a time
by Anonymous Monk on Dec 06, 2012 at 14:19 UTC
    Right now you consider 4 lines ... soon it may be 6. Plan ahead. Load the lines into an array by pushing them onto the end of it; noting when you've reached end-of-file e.g. if there should be less than 4 lines or you've run out. Each time you loop back, shift the first line off the array and push a new line onto the other end. The lines can be reached directly by indexes 0..3.
Re: Reading files n lines a time
by blue_cowdawg (Monsignor) on Dec 06, 2012 at 14:50 UTC

    Consider this solution:

    #!/usr/bin/perl -w ###################################################################### +## use strict; use Tie::File; tie my @fin,"Tie::File","multilines.txt" or die $!; for(my $ix=0;$ix <= $#fin;$ix+=4){ my @lines=(); foreach (my $iy=0;($iy < 4) && (($ix+$iy)<=$#fin);$iy++){ push @lines, $fin[$ix+$iy]; } printf "Lines starting with line %d:\n%s\n\n",$ix+1,join("\n",@lin +es); } untie @fin;
    Tie::File lets you play with files as if they were arrays and is part of the standard Perl distribution.


    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
Re: Reading files n lines a time
by eyepopslikeamosquito (Canon) on Dec 07, 2012 at 03:02 UTC

    Here's a simple way to do it using modulo chunksize on the line number $.

    use strict; use warnings; my $fname = shift or die "usage: $0 fname\n"; open(my $fh, '<', $fname) or die "error: open '$fname': $!"; my $chunksize = 4; my $chunk = ""; while ( my $line = <$fh> ) { $chunk .= $line; next if $. % $chunksize; print "---chunk---\n$chunk"; $chunk = ""; } close $fh; if (length $chunk) { print "---chunk (leftover)---\n$chunk"; }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1007560]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (13)
As of 2014-10-01 09:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (1 votes), past polls