Melly has asked for the wisdom of the Perl Monks concerning the following question:
Hi Monkees,
What's the best way to read a particular line from a file? e.g. I want to perform a regex against line 10 of a file, but I don't want/need to read in the whole file.
my $line2 = (<FILE>)[9];
works, but I'm not sure how efficient this is. I could run a loop, but that would look messy, and if I want, say, line 5000, it would be inefficient (IMHO).
Any advice?
Tom Melly, tom@tomandlu.co.uk
Re: Best way to read line x from a file
by Corion (Patriarch) on Mar 29, 2004 at 15:34 UTC
|
I think that Tie::File is the best compromise between speed and simplicity for such tasks.
The fastest way would be a loop like the following, assuming that the line indicated by $line_no will start in the first half of the file:
<FILE>
while ($line_no--);
my $line2 = <FILE>;
as then, Perl and the OS will do some buffering for you, and you don't read the whole file for nothing.
If your file size is smaller than one sector, the OS (and the HD) will read it into memory anyway, and it might be faster to slurp it into memory and use a crafted regular expression against it:
use File::Slurp qw(slurp);
my $f = slurp $filename;
my $line2 = $1
if (m!\n{$line_no-1}([^\n]*)\n!sm);
So in the end, you will have to benchmark a lot. | [reply] [d/l] [select] |
|
use File::Slurp;
my $f = read_file $filename;
my $line2 = $1
if ($f =~ m!\A(?:.*\n){@{[$line_no-1]}}(.*)\n!m);
... but I wouldn't recommend it.
And for the sake of completeness, here the solution spelled out with Tie::File which lots of people mentioned already.
use Tie::File;
tie my @file, 'Tie::File', $filename
or die "Couldn't tie '$filename': $!";
my $line2 = $file[9];
| [reply] [d/l] [select] |
Re: Best way to read line x from a file
by arden (Curate) on Mar 29, 2004 at 15:31 UTC
|
$. = 0;
do { $LINE = <FILE> } until $. == $DESIRED_LINE_NUMBER || eof;
Now, if you're going to potentially bounce around within the file (say, look at line 5000, then line 20, then line 42, etc), there are other strategies, but since I don't think that's what you're looking for, we won't go there right yet. . .
- - arden.
arden is more of an orangutan than a monkee | [reply] [d/l] |
|
| [reply] |
|
| [reply] |
Re: Best way to read line x from a file
by davido (Cardinal) on Mar 29, 2004 at 16:09 UTC
|
my $line2 = (<FILE>)[9];
Your method evaluates <FILE> in list context, resulting in a file slurp. Then you index into only one line, and let the rest of the slurp fall into the bit-bucket.
I agree with Corion that Tie::File is a great solution.
But I couldn't leave well enough alone, and had to come up with yet another way to do it. This solution still reads through the file up until it gets to the desired line. There's no way around that unless your lines are fixed-length.:
my $linenum = 10;
while ( my $line = <FILE>) {
next unless $. == $linenum;
# Process the one line here...
last; # No need to continue.
}
I hadn't seen anyone using $. yet. See perlvar.
Update:Added last; to the loop. Thanks for the reminder.
| [reply] [d/l] [select] |
|
| [reply] |
Re: Best way to read line x from a file
by ctilmes (Vicar) on Mar 29, 2004 at 16:05 UTC
|
You might also consider using Mmap. You can
treat the file as a variable, and only the portions of it that you actually access will get read from disk, and then in a very efficient manner.
| [reply] |
Re: Best way to read line x from a file
by ambrus (Abbot) on Mar 29, 2004 at 18:32 UTC
|
As others have said, this is wasteful because it reads the whole file
while it should read only the first 9 lines.
If you want a solution that has no visible loop (or map etc), you could try
using the module Tie::File. This module
is in the standard Perl distrib. (Note that Tie::File numbers the lines with zero-offset.)
Otherwise, for me
$l= <$F> for 1..9;
seems the best solution but there might be a more elegant one.
| [reply] [d/l] [select] |
Re: Best way to read line x from a file
by gmpassos (Priest) on Mar 29, 2004 at 16:54 UTC
|
Well, you really need to read line by line to ensure that you are in line X, unless you have fixed line sizes.
Other thing that you can do, to avoid to alwasy read all the file, is to save something like an index of the position in bytes of some lines in an extra file. Soo, for a big file you can have some indexed lines, and when you want to go to line X, you choose the nearest indexed line to start to search for line X, but note that the search for the nearest line in the index need to be very fast and small, or you won't get too much optimization.
Graciliano M. P.
"Creativity is the expression of the liberty".
| [reply] |
Re: Best way to read line x from a file
by flyingmoose (Priest) on Mar 29, 2004 at 19:13 UTC
|
Hi Monkees,
Hey hey, we're the Monkees, people say we monkey around, but we're too busy coding, to put the Camel down...
Somebody else, next verse...
| [reply] |
|
We're just trying to be friendly, we only want to code all day.And if you don't use strict, we're gonna have something to say.
And the real lyrics. :)
There is no emoticon for what I'm feeling now.
| [reply] |
|
|