Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

efficient way to read a file in reverse

by cmcl (Novice)
on Jan 06, 2021 at 13:29 UTC ( #11126426=perlquestion: print w/replies, xml ) Need Help??

cmcl has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I need to work out a way to read log files from latest entry to earliest (so in reverse) quickly and with minimum overhead (so reading line by line rather than loading the whole file into memory). When I've benchmarked this, reading a file line by line is vastly quicker than reading the log file into memory (some of these log files are very large). I also have to use what is installed on these machines (of which there are thousands of them I need to run it on), so that rules out using File::ReadBackwards. So I opted to use a system command 'tac' to read the file and output it to a filehandle:
my @loglist = nsort(@oldmsglogs); push(@loglist, "$basemsglog"); my @sortedlogs = (reverse @loglist); for $msglogfile (@sortedlogs) { if ($done eq 0) { $logfile = join('/', $baselogdir, $msglogfile); open(RDMESG, "-|", "tac", $logfile); while ( <RDMESG> ) { my $line = $_; if ($line =~ m/(^[A-Z][a-z][a-z]\s+\d{1,2}\s+\d{1,2}\:\d{1,2 +}\:\d{1,2})(.*)/) { my $logtime = $1; my $msg = $2; my $logutime = str2time("$logtime"); if ($logutime ge $unixstarttime) { if ($msg =~ m/kernel\:\s+CPU\d{1,3}\:\s+Package\s+temper +ature\s+above\s+threshold\,\s+cpu\s+clock\s+throttled/) { $ctcnt++; } } else { $done = 1; last; } } } close RDMESG; } else { if ($ctcnt > $threshold) { $ret = sprintf("$warnmsg %d", $ctcnt); return $ret; } else { return $ret; } } } if ($ctcnt > $threshold) { $ret = sprintf("$warnmsg %d", $ctcnt); return $ret; }
However, I then have an issue with a broken pipe ('tac: write error: broken pipe'), since I close the filehandle while 'tac' is still writing to it with 'last', and this I do so that I stop reading the file once it falls out of the date range I'm interested in. If I take out 'last', then the error goes away, but it ends up reading the file to the end (or actually start of the file, since it is in reverse), which slows it down enormously, making it not viable. Is there a better way to do this, given the constraints I've mentioned? Perhaps a different way to invoke tac or an elegant way to terminate tac when I want to close the filehandle? Or some other way of doing this entirely? Any help appreciated! Cam

Replies are listed 'Best First'.
Re: efficient way to read a file in reverse
by LanX (Cardinal) on Jan 06, 2021 at 13:50 UTC
    > I also have to use what is installed on these machines (of which there are thousands of them I need to run it on), so that rules out using File::ReadBackwards

    It's pure Perl, you could even copy the rather short code into a module of yours.

    If you wanna implement it yourself I'd try reading sliding windows˛ of 4^n kb chunks from the end with seek

    • reverse those windows
    • read line by line form the string (you can also open from a string)
    • reverse the single lines again
    • If one window is depleted, read the next one and append it to the last window.

    I'd put all the logic into a sub which returns those lines or maybe I'd try to create a new IO class.

    I remember a similar question not too long ago, you should try super search °

    update

    °) like Reading the contents of a file from the bottom up

    ˛) sliding windows explained

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      you could even copy the rather short code into a module of yours.

      Or better still, use App::FatPacker to create your stand-alone, distributable script. It will do all the hard/dull work so you don't have to.


      🦛

      Thanks for the suggestions Rolf. I think adapting the code from the module sounds like the best bet, though taking a look at the module, I will need to do my homework on what a lot of it means (I only dabble with scripting really), so a good learning exercise!

      Cheers,
      Cam
        Hmm it's offering a "tie filehandle" interface, that's not common and rather deep for a beginner (though a good idea)

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: efficient way to read a file in reverse
by haukex (Bishop) on Jan 06, 2021 at 13:50 UTC
    I also have to use what is installed on these machines ... that rules out using File::ReadBackwards.

    Not really - it's a pure-Perl module and not that many lines of code. Yes, even you can use CPAN and in the very worst case, you could copy the module's code into your script.

    Edit: LanX beat me to it ;-)

Re: efficient way to read a file in reverse
by tybalt89 (Prior) on Jan 07, 2021 at 15:13 UTC

    XY problem?

    Looking at your code it appears you are reading kernel logs for temperature problems, but only after a certain unix time.
    It may be possible to use Search::Dict to find the first line in the file on or after a specified time and then read forward to the end.
    No backward reading would be required.

    It would look something like this:

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11126426 use warnings; use Search::Dict; use Date::Parse qw( str2time ); use Time::HiRes qw( time ); my $day = 60 * 60 * 24; # commented code used to create 209M file; #my $str = join '', map { localtime( time + $_ * $day) . # " kernel: log entry\n" } -5e6 .. 5; #print $str; #use Path::Tiny; path('d.searchdict')->spew($str); #print "string length = @{[length $str]}\n"; my $want = time - 1.1 * $day; my $start = time; open my $fh, '<', 'd.searchdict' or die; look $fh, $want, { comp => sub { $_[0] <=> $_[1] }, xfrm => sub { str2time substr shift, 0, 24 }, }; printf "look took %.3f seconds\n", time - $start; while( <$fh> ) # now read to end of file { print; }

    On the other hand, maybe a real read backwards would work better, so here's a simple package I threw together:

    #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11126426 use warnings; my $str = join '', map '.' x $_ . "this is line $_\n", 1 .. 60; print $str; print "string length = ", length $str, "\n"; my $backwards = Tybalt89BackwardsHeReads->new( \$str ) or die; while( defined( $_ = $backwards->line ) ) { print; } package Tybalt89BackwardsHeReads; #################################### +### sub line { my ($self) = @_; if( @{ $self->{lines} } == 0 and $self->{where} ) { my $window = 1024; # window size, adjust to suit my $pos = $self->{where} - $window; $pos < 0 and $pos = 0; seek $self->{fh}, $pos, 0; read $self->{fh}, my $data, $self->{where} - $pos; $pos and $data =~ s/^.*\n// ? ($pos += $+[0]) : die "increase wind +ow size"; $self->{lines} = [ split /^/, $data ]; $self->{where} = $pos; } return pop @{ $self->{lines} }; } sub new { my ($self, $filename) = @_; open my $fh, '<', $filename or die "$! on $filename"; bless { fh => $fh, where => ref $filename ? length $$filename : -s $filename, lines => [] }, ref $self || $self; } 1; # so if split off, package ends with true

    Use your real filename in the ->new() call instead of the string reference I was using for testing. And you can remove the string generation code also.

    You may want to change the package name :)

    P.S. I enjoyed writing the read backwards code, thanks for the inspiration!

      Hi Tybalt89,

      Glad you enjoyed it! I omitted the first bit of the code where I do the timestamps for brevity's sake, but yes, the purpose is to look for the last 24 hours of logs for any thermal throttling messages. Thanks for the code, I think that will be quicker for me to adapt than the original File::ReadBackwards (I'll dig into 'tie filehandle' when I get a chance sometime), and there's some things new to me in your example as well which look interesting.

      Cheers,
      Cam

        Since I haven't played with tied file handles before, here's the previous package with the ability to use it either as an object, or as a tied file handle.

        #!/usr/bin/perl use strict; # https://perlmonks.org/?node_id=11126426 use warnings; tie *BACKWARDS, 'Tybalt89BackwardsHeReads', $0 or die 'tie failed'; while( <BACKWARDS> ) { print; } package Tybalt89BackwardsHeReads; #################################### +### BEGIN { *TIEHANDLE = \&new; *READLINE = \&line; } sub line { my ($self) = @_; while( @{ $self->{lines} } == 0 and $self->{where} ) { my $window = 1024; # window size, adjust to suit my $pos = $self->{where} - $window; $pos < 0 and $pos = 0; seek $self->{fh}, $pos, 0; read $self->{fh}, my $data, $self->{where} - $pos; $pos and $data =~ s/^\N*\n(?=.)//s ? ($pos += $+[0]) : die "increase window size"; $self->{lines} = [ split /^/, $data ]; $self->{where} = $pos; } return pop @{ $self->{lines} }; } sub new { my ($self, $filename) = @_; open my $fh, '<', $filename or die "$! on $filename"; bless { where => ref $filename ? length $$filename : -s $filename, fh => $fh, lines => [] }, ref $self || $self; } 1; # so if split off, package ends with true

        Replace the $0 with the name of your file. I was just using it for debugging.

        Please use this latest version of "sub line", it solves a problem with the fetching the line from the beginning of the file when the window is too small.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11126426]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (4)
As of 2021-04-19 19:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?