Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Reading multiple lines?

by rdw (Curate)
on Nov 28, 2000 at 07:26 UTC ( [id://43603]=perlquestion: print w/replies, xml ) Need Help??

rdw has asked for the wisdom of the Perl Monks concerning the following question:

Ok, this couldn't get much simpler - all I want to do is read multiple lines (or records) from a file.

I could do this :-

my @slurp = <IN>; while (my @chunk = splice(@slurp, 0, 10)) { # do stuff }

Or I could go completely mad and do this :-

while (my @chunk = map{scalar<IN>||()} 1..10) { # do stuff }

I don't want to read the whole file into memory, so the first solution is no good for me, but the second solution seems a little silly.

Anybody got any better ideas?

Have fun,

rdw

Replies are listed 'Best First'.
Re: Reading multiple lines?
by merlyn (Sage) on Nov 28, 2000 at 08:02 UTC
    Well, I always go back to my "grow your own control structure", the naked block:
    my @buffer; { push @buffer, scalar <IN>; redo unless eof(IN) or @buffer >= 10; ## process @buffer @buffer = (); redo unless eof(IN); }
    Says it clearly. No having to think of multiple screwiness.

    -- Randal L. Schwartz, Perl hacker

Re: Reading multiple lines?
by Fastolfe (Vicar) on Nov 28, 2000 at 07:50 UTC
    What you're asking for is kind of weird, so yah, there's not going to be an elegant way to do it (that I can think of). I might approach it like this:
    for ($i=0, @chunk = (); !eof(IN) && $i < 10; $i++) { $chunk[$i] = <IN>; }
    That should probably answer your question. In the event you're curious about any other alternatives to your approach, you could perhaps read data from the file in blocks (of, say, 1k or 4k or whatever), and handle that block before moving on:
    # Untested, but you get the idea while (read(IN, $buf, 1024, length($buf))) { while ($buf =~ s/^([^\r\n]*)[\r\n]+//) { handle_line($1); } }
    Realistically, though, this isn't buying you any performance, if that's what you're worried about. Let Perl do the line handling.
Re: Reading multiple lines?
by chromatic (Archbishop) on Nov 28, 2000 at 09:52 UTC
    We need more special variables here, especially $.:
    local $.; # make sure you're in a restrictive block while ($. < 10) { push @lines, scalar <IN>; last if eof(IN); }
    I tested this on a file of 11 lines and one of 9. Worked both ways. If you don't care about uninitialized value errors, you can leave out the local.
      Unless I'm missing something this code would only get the first block of 10 lines then would stop. Maybe this would be better:
      local $.; while (<IN>) { push @lines , scalar <IN>; next if (($.%10 == 0) and (!eof(IN))); # process @lines @lines = (); # clear }
Re (tilly) 1: Reading multiple lines?
by tilly (Archbishop) on Nov 28, 2000 at 08:06 UTC
    You won't win any performance, but still the following (untested code) should work:
    while (my @chunk = get_chunk(\*IN, 10)) { # etc } sub get_chunk { my $fh = shift; my $count = shift; my @result; push @result, scalar <$fh> foreach 1..$count; return @result; }
    UPDATE
    Erk. It shouldn't work. :-(

    Lots of attempts to read from a possibly closed filehandle. Try this with STDIN and see some interesting behaviour. :-(

    Try the following (tested) code:

    my $sub = chunker(\*IN, 10); while (my @chunk = $sub->()) { # Do something amusing } sub chunker { my $fh = shift; my $count = shift; return sub { my @ret; while (@ret < $count) { my $line = <$fh>; if (defined($line)) { push @ret, $line; } else { $count = 0; } } return @ret; }; }
(crazyinsomniac) Re: Reading multiple lines?
by crazyinsomniac (Prior) on Nov 28, 2000 at 10:23 UTC
    What about messing with $/ ?
    ------------from perlvar--------- input_record_separator HANDLE EXPR $INPUT_RECORD_SEPARATOR $RS $/ The input record separator, newline by default. This influences Perl's + idea of what a ``line'' is. Works like awk's RS variable, including +treating empty lines as a terminator if set to the null string. (An e +mpty line cannot contain any spaces or tabs.) You may set it to a mul +ti-character string to match a multi-character terminator, or to unde +f to read through the end of file. Setting it to "\n\n" means somethi +ng slightly different than setting to "", if the file contains consec +utive empty lines. Setting to "" will treat two or more consecutive e +mpty lines as a single empty line. Setting to "\n\n" will blindly ass +ume that the next input character belongs to the next paragraph, even + if it's a newline. (Mnemonic: / delimits line boundaries when quotin +g poetry.) undef $/; # enable "slurp" mode $_ = <FH>; # whole file now here s/\n[ \t]+/ /g; Remember: the value of $/ is a string, not a regex. awk has to be bett +er for something. :-) Setting $/ to a reference to an integer, scalar containing an integer, + or scalar that's convertible to an integer will attempt to read reco +rds instead of lines, with the maximum record size being the referenc +ed integer. So this: $/ = \32768; # or \"32768", or \$var_containing_32768 open(FILE, $myfile); $_ = <FILE>; will read a record of no more than 32768 bytes from FILE. If you're no +t reading from a record-oriented file (or your OS doesn't have record +-oriented files), then you'll likely get a full chunk of data with ev +ery read. If a record is larger than the record size you've set, you' +ll get the record back in pieces. On VMS, record reads are done with the equivalent of sysread, so it's +best not to mix record and non-record reads on the same file. (This i +s unlikely to be a problem, because any file you'd want to read in re +cord mode is probably unusable in line mode.) Non-VMS systems do norm +al I/O, so it's safe to mix record and non-record reads of a file.

    "cRaZy is co01, but sometimes cRaZy is cRaZy".
                                                          - crazyinsomniac

Re: Reading multiple lines?
by matthew (Acolyte) on Nov 29, 2000 at 01:33 UTC
    It's not terribly pretty, but it's cheap:
    open FILE, "../data/foo.txt"; @lines = ($var1,$var2,$var3,$var4,$var5, $var6,$var7,$var8,$var9,$var10) = <FILE>; close FILE; print join "\n", @lines;
    -Matthew
      Got an even better one:
      #!/usr/bin/perl my $file = "../data/foo.txt"; my @lines = getFileSlice($file,"10","1"); print join "", @lines; print "\n"; sub getFileSlice() { my $file = shift; my $length = shift; my $offset = shift; my @lines; open FILE, "../data/foo.txt"; for($i=0;$i<$offset;$i++) { <FILE> } for($i=0;$i<$length;$i++) { my $line = <FILE>; push @lines, $line; } close FILE; return @lines; }
      -Matthew
      This is even more simple:
      open FILE, "../data/foo.txt"; @lines = (<FILE>)[0..9]; close FILE; print "@lines";

      Well that's the kind of thing which would be nice to write, except I'd probably do it like this :-

      @lines[0..9] = <FILE>

      ...but it still reads the whole file in (it just throws most of it away). Firstly, that'll use too much memory in this case, and secondly you can't then read the next N lines in.

      Have fun,

      rdw

Re: Reading multiple lines?
by lolindrath (Scribe) on Nov 29, 2000 at 05:19 UTC
    Hmm, it kind of seems like you're trying to reinvent a database system. They read in only one record from a disk at a time. Any qualms against using a database?

    --=Lolindrath=--

      No qualms at all, I'm trying to get some existing data into a database, and I've got a lot of it. The file structure is a bit odd, and I need to read in N lines of header, then M lines of secondary data, before looping through line by line for a while and then going back to the header structure.

      I don't want to read it all into memory because the file is about 160Mb with about 8 million lines. The header is always a fixed number of lines, the secondary data is optional but a fixed number of lines and the bulk of the data is usually somewhere between 100 and 10,000 lines.

      I was just surprised that this wasn't as easy / neat to do as I expected. I'm quite pleased with my original map one liner, but nobody has really commented on whether it was really all that bad.

      Have fun,

      rdw

        I was just surprised that this wasn't as easy / neat to do as I expected. I'm quite pleased with my original map one liner, but nobody has really commented on whether it was really all that bad.
        OK, I'll comment on that. In my mind map is a way to go from X to f(X) for a bunch of X's. If f(X) doesn't depend on X, it makes my brain go tilt a bit, but I can probably get used to it. Hence, I'll almost certainly try a different solution before I accept the void-arg map alternative.

        Hmm. What you really probably have is a state machine. I could see a big Switch statement based on state (reading header A, reading header B, in the body) with eof(IN) at the top, and if eof is detected while in header A or B, then carp out.

        See, I get worried about when the unusual happens. Maybe it's just my 30 years of programming, but any time I see someone write a "read 10 lines here" loop, I think "what if there aren't 10 lines?". That's what makes me good at QA. :)

        -- Randal L. Schwartz, Perl hacker

        would it faster just lynux/unix command line? tail $fileName -n $start | head -n $length where $start =100 and $length =10000-100 So that no readin file and thus not much memory used. I used this to fetch block of lines within a file with more than 100 million lines. Average time to get results was ~ 1 minute.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://43603]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2024-04-18 22:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found