grab 'n' lines from a file above and below a /match/

barathbr has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: grab 'n' lines from a file above and below a /match/ by Aristotle (Chancellor) on Sep 16, 2004 at 22:39 UTC
Well, you'll have to store the lines. And it's going to be tricky to correctly handle matches where the context overlaps, ie where one match follows less than 2n lines from the previous. The easiest thing to do is use the toolbox: `egrep -C n c9391b56-b174-441b-921c-7d63 GWSvc.log` Update: the following should work and handle all edge cases: `my @backlog; my $to_print = 0; my $context_size = 10; while(<>) { $to_print = 1 + $context_size if /c9391b56-b174-441b-921c-7d63/; push @backlog, $_; if( $to_print ) { print shift @backlog; --$to_print; } elsif( @backlog > $context_size / 2 ) { shift @backlog; } } print shift @backlog while @backlog and $to_print--;` [download] Makeshifts last the longest.	[reply] [d/l]
Re^2: grab 'n' lines from a file above and below a /match/ by tachyon (Chancellor) on Sep 17, 2004 at 03:55 UTC
TIMTOWDI or here is one way to use $. ;-) `my $context = 4; my @buffer = ('') x $context; my $print_to = 0; my $match = qr/42/; while(<DATA>) { if ( m/$match/ ) { print @buffer; $print_to = $. + $context; @buffer = ('') x $context; } push @buffer,$_; shift @buffer; print if $. <= $print_to; }` [download] cheers tachyon	[reply] [d/l]
Re^2: grab 'n' lines from a file above and below a /match/ by barathbr (Scribe) on Sep 16, 2004 at 22:50 UTC
I have the gateway running at full logging level, thats the only time I am going to be seeing these params (this particular one is actually a SIP subscribe request ID, additional parts of which I have removed). I will be running into such entries probably once in 100 lines. So, I am not really worried about the overlap part of your answer. As for egrep - its windows, so not available to me - plz help	[reply]
Re^3: grab 'n' lines from a file above and below a /match/ by Aristotle (Chancellor) on Sep 16, 2004 at 23:04 UTC
As for egrep - its windows, so not available to me Sure is. A quick Google search reveals GNU utilities for Win32 and Gnu Grep for Win32. Makeshifts last the longest.	[reply]
Re^4: grab 'n' lines from a file above and below a /match/ by flogic (Acolyte) on Sep 17, 2004 at 18:14 UTC
Re^2: grab 'n' lines from a file above and below a /match/ by barathbr (Scribe) on Sep 16, 2004 at 23:29 UTC
Hey, On a lighter note - I now realize that I was operating on $_ and trying to play with $., what I dont understand is where would I use $. -> any typical situations where $. might come in handy you might find this to be a newbie question, but I am still learning perl and am just curious. thanks again	[reply]
Re^3: grab 'n' lines from a file above and below a /match/ by Aristotle (Chancellor) on Sep 16, 2004 at 23:40 UTC
`$.` is just the number of the last line read from the last accessed filehandle. It's useful any time you want to know the line number. It does nothing beyond that; in particular, writing to it has no effect at all, other than that its value changes. Makeshifts last the longest.	[reply]
Re^2: grab 'n' lines from a file above and below a /match/ by barathbr (Scribe) on Sep 16, 2004 at 23:15 UTC
Hey this works too !! thanks a lot	[reply]
Re: grab 'n' lines from a file above and below a /match/ by Zaxo (Archbishop) on Sep 16, 2004 at 22:47 UTC
You're trying to do that by manipulating $., but that won't read more lines. Since you want to keep lines from before the match, you'll need to buffer them. Here's one way, `my $n = 10; # , say my @lines; { local $_; while (<LOG>) { push @lines, $_; if (/c9391b56-b174-441b-921c-7d63/) { # push @lines, (<LOG>)[0..$n-1] and last; # better, while (<LOG>) { push @lines, $_; last if @lines > 2*$n; } last; } else { shift @lines while @lines > $n; } } }` [download] Update: improved the code to not read the rest of the file after a match is found. After Compline, Zaxo	[reply] [d/l]
Re^2: grab 'n' lines from a file above and below a /match/ by barathbr (Scribe) on Sep 16, 2004 at 23:07 UTC
I dont see any output using it, I just tried it against a smaller text file and that doesnt give any output either ..	[reply]
Re^3: grab 'n' lines from a file above and below a /match/ by Zaxo (Archbishop) on Sep 16, 2004 at 23:22 UTC
Does it help to say `print @lines;` at the end? After Compline, Zaxo	[reply] [d/l]
Re^4: grab 'n' lines from a file above and below a /match/ by barathbr (Scribe) on Sep 16, 2004 at 23:44 UTC
Re: grab 'n' lines from a file above and below a /match/ by borisz (Canon) on Sep 16, 2004 at 22:47 UTC
untested. On unix grep can do exactly this. `my $MAX = 5; local $_; OUT: while ( defined ( $_ = <LOG> ) ){ push @lines, $_; shift @lines if @lines > $MAX; if ( /bla bla/ ) { print @lines, $_; for ( 1 .. $MAX ) { last OUT unless defined ( $_ = <LOG> ); print; } } }` [download] Boris	[reply] [d/l]
Re^2: grab 'n' lines from a file above and below a /match/ by barathbr (Scribe) on Sep 16, 2004 at 23:11 UTC
Thanks a ton Boris :) works as advertised !!!	[reply]
Re: grab 'n' lines from a file above and below a /match/ by been42 (Curate) on Sep 17, 2004 at 03:50 UTC
I know that I'm showing up a little bit late to the party, but wouldn't Tie::File work for this? `use strict; use Tie::File; # some variables get set up here since we're using strict (wink) tie @lines, 'Tie::File', 'GWSvc.log', memory=>$some_small_number; for ($i=0; $i<$#lines; $i++) { if (/c9391b56-b174-441b-921c-7d63/) { for ($j=$i-5; $j <= $i+5; $j++) { print $lines[$j]; } } }` [download] I'm sure there are a million ways to make it look cleaner, but I'm also very sleepy right now. This seems like it would solve the problem, though. I'm really a big fan of Tie::File after having been 'corrected' on my non-usage of it not too long ago. Now I find uses for it everywhere.	[reply] [d/l]
Re: grab 'n' lines from a file above and below a /match/ by mrpeabody (Friar) on Sep 17, 2004 at 04:35 UTC
Obligatory Tie::File solution. I haven't done any benchmarks, but I would guess it's as fast as the other Perl solutions while being less memory-intensive. As others have said, /bin/grep is the way to go here. #!/usr/bin/perl use strict; use warnings; use Tie::File; use Fcntl 'O_RDONLY'; my $DEBUG = 0; my $text = qr/c9391b56-b174-441b-921c-7d63/; my $file = 'GWSvc.log'; my $context = 3; sub dprint { print @_ if $DEBUG }; my @lines; tie @lines, 'Tie::File', $file, mode => O_RDONLY or die "tie failed: $!"; for (my $i = 0; $i <= $#lines; $i++) { dprint "SCAN: line $i\n"; if ($lines[$i] =~ /$text/) { dprint "MATCH at line $i\n"; my $start = $i - $context; if ($start < 0) { $start = 0; }; my $end = $i + $context; for my $j ($start .. $end) { dprint "$j: "; print "$lines[$j]\n"; }; print "\n"; $i += $context; }; }; [download]	[reply] [d/l]
Re^2: grab 'n' lines from a file above and below a /match/ by Aristotle (Chancellor) on Sep 17, 2004 at 06:23 UTC
It's actually slower and more memory intensive than any of the other solutions. Tie::File internally keeps a list of byte offsets for all the lines, and it needs lot of additional overhead that is supposed to optimize writes which you never make any use of. Your code also doesn't get the edge cases right: if there's a match within less than `$context` lines of the previous, it will be missed. You gave me an idea with regards to memory consumption, though: `#!/usr/bin/perl use strict; use warnings; use Fcntl qw( :seek ); my $rx = qr/c9391b56-b174-441b-921c-7d63/; my $to_print = 0; my $context = 10; my @offs = ( 0 ) x ( 1 + $context ); while(<>) { my $context_start = shift @offs; my $here = tell ARGV; push @offs, $here; if( /$rx/ ) { if( not $to_print ) { my $length = $here - $context_start; seek ARGV, $context_start, SEEK_SET; read ARGV, $_, $length; } $to_print = 1 + $context; } --$to_print, print if $to_print; }` [download] This only needs to keep `$context` offsets in memory. Update: fixed bugs. It was `( 0 ) x $context` which gave one too few lines of before-context and `$here - $context_start + length` which of course ate too much input — but that wasn't obvious with my test data. Oopsie. Makeshifts last the longest.	[reply] [d/l]
Re^3: grab 'n' lines from a file above and below a /match/ by mrpeabody (Friar) on Sep 20, 2004 at 03:07 UTC
It's actually slower and more memory intensive than any of the other solutions. Tie::File internally keeps a list of byte offsets for all the lines, and it needs lot of additional overhead that is supposed to optimize writes which you never make any use of. Oops. Guessed wrong, then. Your code also doesn't get the edge cases right: if there's a match within less than $context lines of the previous, it will be missed. That was intentional, and it depends on your definition of "missed". That hit will be printed with the context of the previous hit. Changing the behavior would just require removing the line: `$i += $context;` [download]	[reply] [d/l]
Re: grab 'n' lines from a file above and below a /match/ by TedPride (Priest) on Sep 17, 2004 at 07:48 UTC
The solution is to store the last 5 lines visited in an array, and also have a variable tracking how many more lines beyond the current one have to be output: `$x = 5; # Number of lines to print above and below. open (LOG, "GWSvc.log") \|\| die "Unable to get a handle to the file: $! +\n"; $after = 0; while (<LOG>) { if ($after) { print $_; $after--; } else { push (@lines, $_); if ($#lines > $x) { shift(@lines); } } if (/c9391b56-b174-441b-921c-7d63/) { print $line while ($line = shift(@lines)); $after = $x; } }` [download]	[reply] [d/l]
Re^2: grab 'n' lines from a file above and below a /match/ by melora (Scribe) on Sep 17, 2004 at 15:10 UTC
No offense, but s/The solution/A solution/. There is more than one way..	[reply]
Re: grab 'n' lines from a file above and below a /match/ by cosimo (Hermit) on Sep 17, 2004 at 06:52 UTC
Sorry for this shameless self-reference... :-) You might be interested in something I wrote to extract pieces of huge database dump files: A little script that combines `head' and `tail' utilities	[reply]
Re: grab 'n' lines from a file above and below a /match/ by DrHyde (Prior) on Sep 17, 2004 at 15:49 UTC
I have a rather stupid question There are no stupid questions, only stupid ways of asking them, and you didn't do that so that's OK. The solution you are looking for is, I suspect, to read a line at a time, populating an array of 2n+1 entries (n above the line, plus the line, plus n below the line). Once the array is full, shift the first entry out of it, and push a new entry onto the end. When the middle entry in the array matches your desired string, dump the whole array to the screen. Something like this ... `open(FILE, 'file') \|\| die("Yaroo!\n"); my $N = 3; # 3 lines above and below my $target = 'stuff what you want'; my @window = ('') x ($N + $N + 1); while(<FILE>) { shift @window; push @window, $_; print @window, "\n\n" if($window[$N] =~ /$target/); } foreach(1 .. $N) { shift @window; print @window, "\n\n" if($window[$N] =~ /$target/); }` [download] update: I should have explained, the `foreach` loop copes with the case where the target text appears within the last N lines of the file.	[reply] [d/l] [select]
Re: grab 'n' lines from a file above and below a /match/ by Anonymous Monk on Sep 17, 2004 at 20:19 UTC
# In fat crayon mode for your pleasure # spoofing your big log file handle as a little array for brevity my @logs = (); for(1..20){push(@logs, $_)} my $lookBack = 2; # the number of lines behind the match you want my $lookAhead = 2; # the number of lines ahead the match you want my $matchReg = '9'; # what your matching on (can be inlined) my $matched = 0; # flag when we have a match my @buffer = (); # your lookBack buffer # Iterate through the logs looking for match foreach my $line (@logs) { # Feed the buffer push(@buffer, $line); # Trim the buffer shift(@buffer) if $#buffer > ($lookBack-1); # Do stuff if this line matches if ($line =~ /$matchReg/) { # Print the buffer for(@buffer){print "lookback: $_\n"} # Wave a flag we have a match $matched = $lookAhead; # Nothing else to see here, move along next; } # Still working the lookAhead from a prior match? if($matched){ # one less match --$matched; print "LookAhead: $line\n"; } } [download]	[reply] [d/l]