Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

grab 'n' lines from a file above and below a /match/

by barathbr (Scribe)
on Sep 16, 2004 at 22:27 UTC ( [id://391576]=perlquestion: print w/replies, xml ) Need Help??

barathbr has asked for the wisdom of the Perl Monks concerning the following question:

Fellow Monks, I have a rather stupid question ...

I am hunting for a specific set of values in my log file and want to print out 'n' lines from above and below the match.
I have the following tiny piece which I know is horribly wrong. Just spits out the matching lines alone. I have been working on pieces of a huge module and I am just too tired to think right. Any quick pointers on how I can proceed would be greatly appreciated ...
open (LOG, "GWSvc.log") || die "Unable to get a handle to the file: $! +\n"; while (<LOG>) { if (/c9391b56-b174-441b-921c-7d63/) { $. = ($. - 5); for (my $i=0;$i<=10;$i++) { print; $.++; } } }

Pl. note that the log runs into a few 100 megs and my sorry system can't really load all of it into memory
Thanks

Replies are listed 'Best First'.
Re: grab 'n' lines from a file above and below a /match/
by Aristotle (Chancellor) on Sep 16, 2004 at 22:39 UTC

    Well, you'll have to store the lines. And it's going to be tricky to correctly handle matches where the context overlaps, ie where one match follows less than 2n lines from the previous.

    The easiest thing to do is use the toolbox: egrep -C n c9391b56-b174-441b-921c-7d63 GWSvc.log

    Update: the following should work and handle all edge cases:

    my @backlog; my $to_print = 0; my $context_size = 10; while(<>) { $to_print = 1 + $context_size if /c9391b56-b174-441b-921c-7d63/; push @backlog, $_; if( $to_print ) { print shift @backlog; --$to_print; } elsif( @backlog > $context_size / 2 ) { shift @backlog; } } print shift @backlog while @backlog and $to_print--;

    Makeshifts last the longest.

      TIMTOWDI or here is one way to use $. ;-)

      my $context = 4; my @buffer = ('') x $context; my $print_to = 0; my $match = qr/42/; while(<DATA>) { if ( m/$match/ ) { print @buffer; $print_to = $. + $context; @buffer = ('') x $context; } push @buffer,$_; shift @buffer; print if $. <= $print_to; }

      cheers

      tachyon

      I have the gateway running at full logging level, thats the only time I am going to be seeing these params (this particular one is actually a SIP subscribe request ID, additional parts of which I have removed). I will be running into such entries probably once in 100 lines. So, I am not really worried about the overlap part of your answer.

      As for egrep - its windows, so not available to me - plz help
      Hey, On a lighter note - I now realize that I was operating on $_ and trying to play with $., what I dont understand is where would I use $. -> any typical situations where $. might come in handy
      you might find this to be a newbie question, but I am still learning perl and am just curious.
      thanks again

        $. is just the number of the last line read from the last accessed filehandle. It's useful any time you want to know the line number. It does nothing beyond that; in particular, writing to it has no effect at all, other than that its value changes.

        Makeshifts last the longest.

      Hey this works too !!
      thanks a lot
Re: grab 'n' lines from a file above and below a /match/
by Zaxo (Archbishop) on Sep 16, 2004 at 22:47 UTC

    You're trying to do that by manipulating $., but that won't read more lines. Since you want to keep lines from before the match, you'll need to buffer them. Here's one way,

    my $n = 10; # , say my @lines; { local $_; while (<LOG>) { push @lines, $_; if (/c9391b56-b174-441b-921c-7d63/) { # push @lines, (<LOG>)[0..$n-1] and last; # better, while (<LOG>) { push @lines, $_; last if @lines > 2*$n; } last; } else { shift @lines while @lines > $n; } } }

    Update: improved the code to not read the rest of the file after a match is found.

    After Compline,
    Zaxo

      I dont see any output using it, I just tried it against a smaller text file and that doesnt give any output either ..

        Does it help to say print @lines; at the end?

        After Compline,
        Zaxo

Re: grab 'n' lines from a file above and below a /match/
by borisz (Canon) on Sep 16, 2004 at 22:47 UTC
    untested. On unix grep can do exactly this.
    my $MAX = 5; local $_; OUT: while ( defined ( $_ = <LOG> ) ){ push @lines, $_; shift @lines if @lines > $MAX; if ( /bla bla/ ) { print @lines, $_; for ( 1 .. $MAX ) { last OUT unless defined ( $_ = <LOG> ); print; } } }
    Boris
      Thanks a ton Boris :) works as advertised !!!
Re: grab 'n' lines from a file above and below a /match/
by been42 (Curate) on Sep 17, 2004 at 03:50 UTC
    I know that I'm showing up a little bit late to the party, but wouldn't Tie::File work for this?

    use strict; use Tie::File; # some variables get set up here since we're using strict (wink) tie @lines, 'Tie::File', 'GWSvc.log', memory=>$some_small_number; for ($i=0; $i<$#lines; $i++) { if (/c9391b56-b174-441b-921c-7d63/) { for ($j=$i-5; $j <= $i+5; $j++) { print $lines[$j]; } } }

    I'm sure there are a million ways to make it look cleaner, but I'm also very sleepy right now. This seems like it would solve the problem, though. I'm really a big fan of Tie::File after having been 'corrected' on my non-usage of it not too long ago. Now I find uses for it everywhere.

Re: grab 'n' lines from a file above and below a /match/
by mrpeabody (Friar) on Sep 17, 2004 at 04:35 UTC
    Obligatory Tie::File solution. I haven't done any benchmarks, but I would guess it's as fast as the other Perl solutions while being less memory-intensive.

    As others have said, /bin/grep is the way to go here.

    #!/usr/bin/perl use strict; use warnings; use Tie::File; use Fcntl 'O_RDONLY'; my $DEBUG = 0; my $text = qr/c9391b56-b174-441b-921c-7d63/; my $file = 'GWSvc.log'; my $context = 3; sub dprint { print @_ if $DEBUG }; my @lines; tie @lines, 'Tie::File', $file, mode => O_RDONLY or die "tie failed: $!"; for (my $i = 0; $i <= $#lines; $i++) { dprint "SCAN: line $i\n"; if ($lines[$i] =~ /$text/) { dprint "MATCH at line $i\n"; my $start = $i - $context; if ($start < 0) { $start = 0; }; my $end = $i + $context; for my $j ($start .. $end) { dprint "$j: "; print "$lines[$j]\n"; }; print "\n"; $i += $context; }; };

      It's actually slower and more memory intensive than any of the other solutions. Tie::File internally keeps a list of byte offsets for all the lines, and it needs lot of additional overhead that is supposed to optimize writes which you never make any use of.

      Your code also doesn't get the edge cases right: if there's a match within less than $context lines of the previous, it will be missed.

      You gave me an idea with regards to memory consumption, though:

      #!/usr/bin/perl use strict; use warnings; use Fcntl qw( :seek ); my $rx = qr/c9391b56-b174-441b-921c-7d63/; my $to_print = 0; my $context = 10; my @offs = ( 0 ) x ( 1 + $context ); while(<>) { my $context_start = shift @offs; my $here = tell ARGV; push @offs, $here; if( /$rx/ ) { if( not $to_print ) { my $length = $here - $context_start; seek ARGV, $context_start, SEEK_SET; read ARGV, $_, $length; } $to_print = 1 + $context; } --$to_print, print if $to_print; }

      This only needs to keep $context offsets in memory.

      Update: fixed bugs. It was ( 0 ) x $context which gave one too few lines of before-context and $here - $context_start + length which of course ate too much input — but that wasn't obvious with my test data. Oopsie.

      Makeshifts last the longest.

        It's actually slower and more memory intensive than any of the other solutions. Tie::File internally keeps a list of byte offsets for all the lines, and it needs lot of additional overhead that is supposed to optimize writes which you never make any use of.
        Oops. Guessed wrong, then.

        Your code also doesn't get the edge cases right: if there's a match within less than $context lines of the previous, it will be missed.
        That was intentional, and it depends on your definition of "missed". That hit will be printed with the context of the previous hit. Changing the behavior would just require removing the line:
        $i += $context;
Re: grab 'n' lines from a file above and below a /match/
by TedPride (Priest) on Sep 17, 2004 at 07:48 UTC
    The solution is to store the last 5 lines visited in an array, and also have a variable tracking how many more lines beyond the current one have to be output:
    $x = 5; # Number of lines to print above and below. open (LOG, "GWSvc.log") || die "Unable to get a handle to the file: $! +\n"; $after = 0; while (<LOG>) { if ($after) { print $_; $after--; } else { push (@lines, $_); if ($#lines > $x) { shift(@lines); } } if (/c9391b56-b174-441b-921c-7d63/) { print $line while ($line = shift(@lines)); $after = $x; } }
      No offense, but s/The solution/A solution/. There is more than one way..
Re: grab 'n' lines from a file above and below a /match/
by cosimo (Hermit) on Sep 17, 2004 at 06:52 UTC
Re: grab 'n' lines from a file above and below a /match/
by DrHyde (Prior) on Sep 17, 2004 at 15:49 UTC
    I have a rather stupid question

    There are no stupid questions, only stupid ways of asking them, and you didn't do that so that's OK.

    The solution you are looking for is, I suspect, to read a line at a time, populating an array of 2n+1 entries (n above the line, plus the line, plus n below the line). Once the array is full, shift the first entry out of it, and push a new entry onto the end. When the *middle* entry in the array matches your desired string, dump the whole array to the screen. Something like this ...

    open(FILE, 'file') || die("Yaroo!\n"); my $N = 3; # 3 lines above and below my $target = 'stuff what you want'; my @window = ('') x ($N + $N + 1); while(<FILE>) { shift @window; push @window, $_; print @window, "\n\n" if($window[$N] =~ /$target/); } foreach(1 .. $N) { shift @window; print @window, "\n\n" if($window[$N] =~ /$target/); }
    update: I should have explained, the foreach loop copes with the case where the target text appears within the last N lines of the file.
Re: grab 'n' lines from a file above and below a /match/
by Anonymous Monk on Sep 17, 2004 at 20:19 UTC
    # In fat crayon mode for your pleasure # spoofing your big log file handle as a little array for brevity my @logs = (); for(1..20){push(@logs, $_)} my $lookBack = 2; # the number of lines behind the match you want my $lookAhead = 2; # the number of lines ahead the match you want my $matchReg = '9'; # what your matching on (can be inlined) my $matched = 0; # flag when we have a match my @buffer = (); # your lookBack buffer # Iterate through the logs looking for match foreach my $line (@logs) { # Feed the buffer push(@buffer, $line); # Trim the buffer shift(@buffer) if $#buffer > ($lookBack-1); # Do stuff if this line matches if ($line =~ /$matchReg/) { # Print the buffer for(@buffer){print "lookback: $_\n"} # Wave a flag we have a match $matched = $lookAhead; # Nothing else to see here, move along next; } # Still working the lookAhead from a prior match? if($matched){ # one less match --$matched; print "LookAhead: $line\n"; } }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://391576]
Approved by Paladin
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2024-04-24 03:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found