Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Last N lines from file (tail)

by clintp (Curate)
on Dec 18, 2001 at 22:49 UTC ( [id://132906]=CUFP: print w/replies, xml ) Need Help??

This subroutine takes a file name and a line count (N) and will return an open filehandle with its read pointer set to N lines before the end of the file. This differs from implementations (like that in PPT) because the entire file is not read -- it reads backwards from the end. It's a quickie so there's no buffering as in GNU's tail(1), but that can easily be remedied. Sample usage:
my $g=lastn("/usr/dict/words", 400); print while(<$g>);
It was coded to grab the last few bits of a gigantic logfile we use here and may not be suitable for your needs. Enjoy.
sub lastn { my($file, $lines)=@_; my $fh; $lines++; if (! open($fh, $file) ) { print "Can't open $file: $!<P/>"; return; } binmode($fh); sysseek($fh, 0, 2); # Seek to end my $nlcount=0; while($nlcount<$lines) { last unless sysseek($fh, -1, 1); sysread($fh, $_, 1, 0) || die; $nlcount++ if ( $_ eq "\n"); last if $nlcount==$lines; last unless (sysseek($fh, -1, 1)); } seek($fh, sysseek($fh, 0, 1), 0) || warn; $fh; }

Replies are listed 'Best First'.
EEK! sysread() is expensive!
by chip (Curate) on Dec 19, 2001 at 12:50 UTC
    I'm horrified to see so many calls to sysread(). The whole point of sysread() is to bypass buffering and go to the OS. And system calls are slow -- slow enough to notice when you're making hundreds and hundreds of them.

    In other words: DON'T DO THAT.

        -- Chip Salzenberg, Free-Floating Agent of Chaos

      I'm just wondering if your reply relates to my answer also. I changed the sysread and sysseek calls to plain read and seek (and removed that last seek), and benchmarked my old and new versions, and got a huge performance hit (~3 cycles/sec with sys* calls, ~1100 cps w/o sys).
      Update: And I checked clintp's answer with & w/o sys* calls, and it was ~5 cps with and ~4 w/o. This was all reading the last 400 lines from a 1000 line file, 10 bytes/line.
        Well, sure, because a plain read() would have used stdio, which probably translated to the equivalant of sysread(4096) at least. If you're doing N reads of one byte, then change that to N reads of 4096 bytes, of course you'll have a slowdown ... and a wasteful one, since every time you read a block you use only one byte.

            -- Chip Salzenberg, Free-Floating Agent of Chaos

Re: Last N lines from file (tail)
by SpongeBob (Novice) on Dec 19, 2001 at 01:35 UTC
    File::ReadBackwards has similar, but not quite the same, functionality. Rather than seeking and reading one character at a time, though, it'd be more efficient in general to read larger chunks, like File::ReadBackwards does. Perhaps like so...and autovivifing filehandles is only compatible with 5.6+, so I changed that also to be backward compatible:
    sub lastn { my($file, $lines, $bufsiz)=@_; $bufsiz ||= 1024; # Changed FH to STDOUT to avoid warning my $fh = \do { local *STDOUT }; $lines++; if (! open($fh, $file) ) { print "Can't open $file: $!<P/>"; return; } binmode($fh); my $pos = sysseek($fh, 0, 2); # Seek to end my $nlcount=0; while($nlcount<=$lines) { $bufsiz = $pos if $bufsiz > $pos; $pos = sysseek($fh, -$bufsiz, 1); die "Bad seek: $!" unless defined $pos; my $bytes = sysread($fh, $_, $bufsiz, 0); die "Bad read: $!" unless defined $bytes; $nlcount += tr/\n//; $pos = sysseek($fh, -$bufsiz, 1); die "Bad seek: $!" unless defined $pos; last if $pos == 0; } seek($fh, sysseek($fh, 0, 1), 0) || warn; <$fh> for $lines..$nlcount; $fh; }
    Update: It does work when requesting more lines than the file contains, though I fixed it to work with various buffer sizes. I don't think a 20000% speed improvement (for tailing 400 10 byte lines) is 'too much' optimization :-) It does miscount if the last line does not have a line feed, though do you actually count that as a line or not? Besides, your's 'miscounts' in that situation also.
      Except that this doesn't work. Try reading the last N lines from a file with N-5 lines, or a file with < $bufsiz bytes. :)

      I had a version that used buffers and was a virtual clone of the algorithm in tail.c, except that I got lost and frustrated in the boundary conditions and really didn't care anymore. Laziness and impatience.

      If you want to take a stab at doing this right, be my guest. I just don't want to do the requisite testing, because the test conditions are yucky:

      • File of L lines reading:
        • L lines
        • L+l lines
        • L-l lines
        • 0 lines
      • Where bufsiz:
        • < size of the file
        • > size of the file
        • Some even multiple of the size of the file
        • Some even multiple of the size of the file less some portion of bufsz
        • == size of the file
      Basically all of the combinations of these. I got all but the last two coded with nice buffering action.

      After consideration, I figured I'd let the OS worry about buffering and JFDI. As a matter of fact, if you use getc()instead of sysread() (and seek instead of sysseek, etc..) the STDIO package would take care of most of this buffering nonsense anyway.

      sub lastn { my($file, $lines)=@_; my $fh; $lines++; if (! open($fh, $file) ) { print "Can't open $file: $!"; return; } binmode($fh); seek($fh, 0, 2); # Seek to end my $nlcount=0; while($nlcount<$lines) { last unless seek($fh, -1, 1); $_=getc($fh); die unless defined $_; $nlcount++ if ( $_ eq "\n"); last if $nlcount==$lines; last unless (seek($fh, -1, 1)); } $fh; }
      There is such as thing as too much optimizing. :)

      Update: with example.

        One character at a time is still slow, as my benchmarks below showed. This solution still benchmarked at about 4 cps, and my buffered solution gave about 1100 cps.

        Update: BTW, have you looked at File::Tail? It searches from the end of the file also, and if you don't want 'tail -f' behavior (i.e. a blocking read), then you can do:

        my $fh = File::Tail->new(name=>$filename,tail=>$lines); $fh->nowait(1); print $line while $line=$line->read;
        The performance is not horrible on large files, though a bit worse than my function, probably due to the overhead having all sorts of bells and whistles that are not being used.
Re: Last N lines from file (tail)
by belg4mit (Prior) on Dec 19, 2001 at 00:39 UTC
Re: Last N lines from file (tail)
by Juerd (Abbot) on Dec 19, 2001 at 00:00 UTC
    I like this one, it's a lot faster than the average read-the-whole-file-and-then-see-what-the-last-lines-are-script.
    And its usage is a lot cleaner than File::Tail's. Maybe you could wrap it in some nice module and put that on CPAN?

    2;0 juerd@ouranos:~$ perl -e'undef christmas' Segmentation fault 2;139 juerd@ouranos:~$

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://132906]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-03-29 10:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found