Reading multiple lines?

rdw has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Reading multiple lines? by merlyn (Sage) on Nov 28, 2000 at 08:02 UTC
Well, I always go back to my "grow your own control structure", the naked block: `my @buffer; { push @buffer, scalar <IN>; redo unless eof(IN) or @buffer >= 10; ## process @buffer @buffer = (); redo unless eof(IN); }` [download] Says it clearly. No having to think of multiple screwiness. -- Randal L. Schwartz, Perl hacker	[reply] [d/l]
Re: Reading multiple lines? by Fastolfe (Vicar) on Nov 28, 2000 at 07:50 UTC
What you're asking for is kind of weird, so yah, there's not going to be an elegant way to do it (that I can think of). I might approach it like this: `for ($i=0, @chunk = (); !eof(IN) && $i < 10; $i++) { $chunk[$i] = <IN>; }` [download] That should probably answer your question. In the event you're curious about any other alternatives to your approach, you could perhaps read data from the file in blocks (of, say, 1k or 4k or whatever), and handle that block before moving on: `# Untested, but you get the idea while (read(IN, $buf, 1024, length($buf))) { while ($buf =~ s/^([^\r\n]*)[\r\n]+//) { handle_line($1); } }` [download] Realistically, though, this isn't buying you any performance, if that's what you're worried about. Let Perl do the line handling.	[reply] [d/l] [select]
Re: Reading multiple lines? by chromatic (Archbishop) on Nov 28, 2000 at 09:52 UTC
We need more special variables here, especially $.: `local $.; # make sure you're in a restrictive block while ($. < 10) { push @lines, scalar <IN>; last if eof(IN); }` [download] I tested this on a file of 11 lines and one of 9. Worked both ways. If you don't care about uninitialized value errors, you can leave out the local.	[reply] [d/l]
Re: Re: Reading multiple lines? by repson (Chaplain) on Nov 28, 2000 at 17:20 UTC
Unless I'm missing something this code would only get the first block of 10 lines then would stop. Maybe this would be better: `local $.; while (<IN>) { push @lines , scalar <IN>; next if (($.%10 == 0) and (!eof(IN))); # process @lines @lines = (); # clear }` [download]	[reply] [d/l]
Re (tilly) 1: Reading multiple lines? by tilly (Archbishop) on Nov 28, 2000 at 08:06 UTC
You won't win any performance, but still the following (untested code) should work: `while (my @chunk = get_chunk(\IN, 10)) { # etc } sub get_chunk { my $fh = shift; my $count = shift; my @result; push @result, scalar <$fh> foreach 1..$count; return @result; }` [download] UPDATE* Erk. It shouldn't work. :-( Lots of attempts to read from a possibly closed filehandle. Try this with STDIN and see some interesting behaviour. :-( Try the following (tested) code: `my $sub = chunker(\*IN, 10); while (my @chunk = $sub->()) { # Do something amusing } sub chunker { my $fh = shift; my $count = shift; return sub { my @ret; while (@ret < $count) { my $line = <$fh>; if (defined($line)) { push @ret, $line; } else { $count = 0; } } return @ret; }; }` [download]	[reply] [d/l] [select]
(crazyinsomniac) Re: Reading multiple lines? by crazyinsomniac (Prior) on Nov 28, 2000 at 10:23 UTC
What about messing with $/ ? ------------from perlvar--------- input_record_separator HANDLE EXPR $INPUT_RECORD_SEPARATOR $RS $/ The input record separator, newline by default. This influences Perl's + idea of what a ``line'' is. Works like awk's RS variable, including +treating empty lines as a terminator if set to the null string. (An e +mpty line cannot contain any spaces or tabs.) You may set it to a mul +ti-character string to match a multi-character terminator, or to unde +f to read through the end of file. Setting it to "\n\n" means somethi +ng slightly different than setting to "", if the file contains consec +utive empty lines. Setting to "" will treat two or more consecutive e +mpty lines as a single empty line. Setting to "\n\n" will blindly ass +ume that the next input character belongs to the next paragraph, even + if it's a newline. (Mnemonic: / delimits line boundaries when quotin +g poetry.) undef $/; # enable "slurp" mode $_ = <FH>; # whole file now here s/\n[ \t]+/ /g; Remember: the value of $/ is a string, not a regex. awk has to be bett +er for something. :-) Setting $/ to a reference to an integer, scalar containing an integer, + or scalar that's convertible to an integer will attempt to read reco +rds instead of lines, with the maximum record size being the referenc +ed integer. So this: $/ = \32768; # or \"32768", or \$var_containing_32768 open(FILE, $myfile); $_ = <FILE>; will read a record of no more than 32768 bytes from FILE. If you're no +t reading from a record-oriented file (or your OS doesn't have record +-oriented files), then you'll likely get a full chunk of data with ev +ery read. If a record is larger than the record size you've set, you' +ll get the record back in pieces. On VMS, record reads are done with the equivalent of sysread, so it's +best not to mix record and non-record reads on the same file. (This i +s unlikely to be a problem, because any file you'd want to read in re +cord mode is probably unusable in line mode.) Non-VMS systems do norm +al I/O, so it's safe to mix record and non-record reads of a file. [download] "cRaZy is co01, but sometimes cRaZy is cRaZy". - crazyinsomniac	[reply] [d/l]
Re: Reading multiple lines? by matthew (Acolyte) on Nov 29, 2000 at 01:33 UTC
It's not terribly pretty, but it's cheap: `open FILE, "../data/foo.txt"; @lines = ($var1,$var2,$var3,$var4,$var5, $var6,$var7,$var8,$var9,$var10) = <FILE>; close FILE; print join "\n", @lines;` [download] -Matthew	[reply] [d/l]
Re: Re: Reading multiple lines? by matthew (Acolyte) on Nov 29, 2000 at 01:53 UTC
Got an even better one: `#!/usr/bin/perl my $file = "../data/foo.txt"; my @lines = getFileSlice($file,"10","1"); print join "", @lines; print "\n"; sub getFileSlice() { my $file = shift; my $length = shift; my $offset = shift; my @lines; open FILE, "../data/foo.txt"; for($i=0;$i<$offset;$i++) { <FILE> } for($i=0;$i<$length;$i++) { my $line = <FILE>; push @lines, $line; } close FILE; return @lines; }` [download] -Matthew	[reply] [d/l]
Re: Re: Reading multiple lines? by Anonymous Monk on Nov 29, 2000 at 08:52 UTC
This is even more simple: `open FILE, "../data/foo.txt"; @lines = (<FILE>)[0..9]; close FILE; print "@lines";` [download]	[reply] [d/l]
Re: Re: Reading multiple lines? by rdw (Curate) on Nov 29, 2000 at 14:11 UTC
Well that's the kind of thing which would be nice to write, except I'd probably do it like this :- `@lines[0..9] = <FILE>` ...but it still reads the whole file in (it just throws most of it away). Firstly, that'll use too much memory in this case, and secondly you can't then read the next N lines in. Have fun, rdw	[reply] [d/l]
Re: Reading multiple lines? by lolindrath (Scribe) on Nov 29, 2000 at 05:19 UTC
Hmm, it kind of seems like you're trying to reinvent a database system. They read in only one record from a disk at a time. Any qualms against using a database? --=Lolindrath=--	[reply]
Re: Re: Reading multiple lines? by rdw (Curate) on Nov 29, 2000 at 14:42 UTC
No qualms at all, I'm trying to get some existing data into a database, and I've got a lot of it. The file structure is a bit odd, and I need to read in N lines of header, then M lines of secondary data, before looping through line by line for a while and then going back to the header structure. I don't want to read it all into memory because the file is about 160Mb with about 8 million lines. The header is always a fixed number of lines, the secondary data is optional but a fixed number of lines and the bulk of the data is usually somewhere between 100 and 10,000 lines. I was just surprised that this wasn't as easy / neat to do as I expected. I'm quite pleased with my original map one liner, but nobody has really commented on whether it was really all that bad. Have fun, rdw	[reply]
Re: Re: Re: Reading multiple lines? by merlyn (Sage) on Nov 29, 2000 at 20:02 UTC
I was just surprised that this wasn't as easy / neat to do as I expected. I'm quite pleased with my original map one liner, but nobody has really commented on whether it was really all that bad. OK, I'll comment on that. In my mind `map` is a way to go from X to f(X) for a bunch of X's. If f(X) doesn't depend on X, it makes my brain go tilt a bit, but I can probably get used to it. Hence, I'll almost certainly try a different solution before I accept the void-arg map alternative. Hmm. What you really probably have is a state machine. I could see a big Switch statement based on state (reading header A, reading header B, in the body) with `eof(IN)` at the top, and if eof is detected while in header A or B, then carp out. See, I get worried about when the unusual happens. Maybe it's just my 30 years of programming, but any time I see someone write a "read 10 lines here" loop, I think "what if there aren't 10 lines?". That's what makes me good at QA. :) -- Randal L. Schwartz, Perl hacker	[reply]
Re: Re: Re: Re: Reading multiple lines? by rdw (Curate) on Nov 30, 2000 at 03:19 UTC
Re^3: Reading multiple lines? by Anonymous Monk on Aug 19, 2010 at 21:11 UTC
would it faster just lynux/unix command line? tail $fileName -n $start \| head -n $length where $start =100 and $length =10000-100 So that no readin file and thus not much memory used. I used this to fetch block of lines within a file with more than 100 million lines. Average time to get results was ~ 1 minute.	[reply]


more useful options
	PerlMonks