This subroutine takes a file name and a line count (N) and will return an open filehandle with its read pointer set to N lines before the end of the file. This differs from implementations (like that in PPT) because the entire file is not read -- it reads backwards from the end. It's a quickie so there's no buffering as in GNU's tail(1), but that can easily be remedied. Sample usage:
my $g=lastn("/usr/dict/words", 400);
print while(<$g>);
It was coded to grab the last few bits of a gigantic logfile we use here and may not be suitable for your needs. Enjoy.
sub lastn {
my($file, $lines)=@_;
my $fh;
$lines++;
if (! open($fh, $file) ) {
print "Can't open $file: $!<P/>";
return;
}
binmode($fh);
sysseek($fh, 0, 2); # Seek to end
my $nlcount=0;
while($nlcount<$lines) {
last unless sysseek($fh, -1, 1);
sysread($fh, $_, 1, 0) || die;
$nlcount++ if ( $_ eq "\n");
last if $nlcount==$lines;
last unless (sysseek($fh, -1, 1));
}
seek($fh, sysseek($fh, 0, 1), 0) || warn;
$fh;
}
EEK! sysread() is expensive!
by chip (Curate) on Dec 19, 2001 at 12:50 UTC
|
I'm horrified to see so many calls to sysread().
The whole point of sysread() is to bypass buffering
and go to the OS. And system calls are slow --
slow enough to notice when you're making hundreds and hundreds
of them.
In other words: DON'T DO THAT.
-- Chip Salzenberg, Free-Floating Agent of Chaos | [reply] [Watch: Dir/Any] [d/l] [select] |
|
I'm just wondering if your reply relates to my answer
also. I changed the sysread and sysseek calls to plain
read and seek (and removed that last seek), and benchmarked
my old and new versions, and got a huge performance hit
(~3 cycles/sec with sys* calls, ~1100 cps w/o sys).
Update: And I checked clintp's answer with & w/o
sys* calls, and it was ~5 cps with and ~4 w/o.
This was all reading the last 400 lines from a 1000 line
file, 10 bytes/line.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
Re: Last N lines from file (tail)
by SpongeBob (Novice) on Dec 19, 2001 at 01:35 UTC
|
File::ReadBackwards has similar, but not quite the
same, functionality. Rather than seeking and reading one
character at a time, though, it'd be more efficient in
general to read larger chunks, like File::ReadBackwards does.
Perhaps like so...and autovivifing filehandles is only compatible
with 5.6+, so I changed that also to be backward
compatible: sub lastn {
my($file, $lines, $bufsiz)=@_;
$bufsiz ||= 1024;
# Changed FH to STDOUT to avoid warning
my $fh = \do { local *STDOUT };
$lines++;
if (! open($fh, $file) ) {
print "Can't open $file: $!<P/>";
return;
}
binmode($fh);
my $pos = sysseek($fh, 0, 2); # Seek to end
my $nlcount=0;
while($nlcount<=$lines) {
$bufsiz = $pos if $bufsiz > $pos;
$pos = sysseek($fh, -$bufsiz, 1);
die "Bad seek: $!" unless defined $pos;
my $bytes = sysread($fh, $_, $bufsiz, 0);
die "Bad read: $!" unless defined $bytes;
$nlcount += tr/\n//;
$pos = sysseek($fh, -$bufsiz, 1);
die "Bad seek: $!" unless defined $pos;
last if $pos == 0;
}
seek($fh, sysseek($fh, 0, 1), 0) || warn;
<$fh> for $lines..$nlcount;
$fh;
}
Update: It does work when requesting more
lines than the file contains, though I fixed it to work with various
buffer sizes. I don't think a 20000% speed improvement (for
tailing 400 10 byte lines) is
'too much' optimization :-) It does miscount if the last
line does not have a line feed, though do you actually count
that as a line or not? Besides, your's 'miscounts' in that situation also. | [reply] [Watch: Dir/Any] [d/l] |
|
Except that this doesn't work. Try reading the last N lines from a file with N-5 lines, or a file with < $bufsiz bytes. :)
I had a version that used buffers and was a virtual clone of the algorithm in tail.c, except that I got lost and frustrated in the boundary conditions and really didn't care anymore. Laziness and impatience.
If you want to take a stab at doing this right, be my guest. I just don't want to do the requisite testing, because the test conditions are yucky:
- File of L lines reading:
- L lines
- L+l lines
- L-l lines
- 0 lines
- Where bufsiz:
- < size of the file
- > size of the file
- Some even multiple of the size of the file
- Some even multiple of the size of the file less some portion of bufsz
- == size of the file
Basically all of the combinations of these. I got all but the last two coded with nice buffering action.
After consideration, I figured I'd let the OS worry about buffering and JFDI. As a matter of fact, if you use getc()instead of sysread() (and seek instead of sysseek, etc..) the STDIO package would take care of most of this buffering nonsense anyway.
sub lastn {
my($file, $lines)=@_;
my $fh;
$lines++;
if (! open($fh, $file) ) {
print "Can't open $file: $!";
return;
}
binmode($fh);
seek($fh, 0, 2); # Seek to end
my $nlcount=0;
while($nlcount<$lines) {
last unless seek($fh, -1, 1);
$_=getc($fh);
die unless defined $_;
$nlcount++ if ( $_ eq "\n");
last if $nlcount==$lines;
last unless (seek($fh, -1, 1));
}
$fh;
}
There is such as thing as too much optimizing. :)
Update: with example. | [reply] [Watch: Dir/Any] [d/l] |
|
One character at a time is still slow, as my benchmarks
below showed. This solution still benchmarked at about 4 cps, and my buffered solution gave about 1100 cps.
Update: BTW, have you looked at
File::Tail? It searches from the end of the file
also, and if you don't want 'tail -f' behavior (i.e. a blocking read), then you can do:
my $fh = File::Tail->new(name=>$filename,tail=>$lines);
$fh->nowait(1);
print $line while $line=$line->read;
The performance is not horrible on large files, though a bit worse than my function, probably
due to the overhead having all sorts of bells and whistles
that are not being used. | [reply] [Watch: Dir/Any] [d/l] |
Re: Last N lines from file (tail)
by belg4mit (Prior) on Dec 19, 2001 at 00:39 UTC
|
| [reply] [Watch: Dir/Any] |
Re: Last N lines from file (tail)
by Juerd (Abbot) on Dec 19, 2001 at 00:00 UTC
|
| [reply] [Watch: Dir/Any] [d/l] |
|
|