Re: get n lines before or after a pattern
by davido (Cardinal) on Jul 25, 2012 at 16:33 UTC
|
When you hear yourself saying "I need to know what comes n lines before XYZ", you should be thinking "I need to stash n previous lines while I iterate through the file." When you hear yourself saying, "I need to know what comes after XYZ until PDQ is found.", you should be thinking of how to identify state (ie, how to keep track of having found the trigger). You can keep track of state with a variable, or you can do it by flowing into a different branch of code. This snippet accomplishes your goal by stashing two lines at all times (clearing them only after XYZ is found), and by flowing into a different branch when XYZ has been found, until PDQ shows up.
As I mentioned above, this is one of several common ways of dealing with state.
use strict;
use warnings;
my $find = 'jack';
my $trigger_re = qr{^name\s+$find\b};
my $finally_re = qr(^lastname\s+\p{Alpha}+\b);
my @stash;
while( my $line = <DATA> ) {
chomp $line;
if( $line =~ $trigger_re ) {
print "$_\n" for @stash;
@stash = ();
print $line, "\n";
while ( my $next = <DATA> ) {
if( $next =~ $finally_re ) {
print $next;
last;
}
}
}
else {
push @stash, $line;
while( @stash > 2 ) {
shift @stash;
}
}
}
__DATA__
start
id 10
address Richmond
name jack
xxxxx
aaaaa
lastname black
yyyy
zzzzz
id 11
address Central
name rick
cccccc
dddddd
lastname hanna
eeeee
yyyyy
id 12
address denver
name jack
sssss
tttttt
lastname strong
rrrrr
mmmmm
id 13
address Virginia
name mick
aaaaaaa
ooooooo
lastname jagger
gggggg
hhhhhh
id 14
address Maine
name rick
sssss
sssss
lastname stewart
ssssss
ffffff
end
The output is...
id 10
address Richmond
name jack
lastname black
id 12
address denver
name jack
lastname strong
If the stash hasn't received two lines ahead of "name jack", it will quietly just print however many it accumulated (max 2). If the "lastname" never shows up, it will quietly flow through the end of the file. This may not be what you want; it's possible that you'll want to just carp about a malformed record the moment the next "name" shows up. That's pretty easy to implement, so I'll leave it to you if you find it advantageous. Similarly, it's a simple check to verify that two lines are stored in @stash prior to printing, and it would be easy to carp a warning about a malformed record there as well.
I build the regexes outside of the loop just to keep the loop code as simple (and general) as possible. This has the added efficiency benefit of assuring that the regex that contains variable interpolation will only be compiled once rather than each time through the loop.
| [reply] [d/l] [select] |
Re: get n lines before a pattern
by VinsWorldcom (Prior) on Jul 25, 2012 at 14:43 UTC
|
Note you're output is not only showing 2 lines before the pattern, but also 1 line AFTER the pattern.
You don't need Perl for something that simple:
grep -B2 -A1 jack test.txt
UPDATE: Since the OP updated the original question, this approach is no longer valid. See my reply (Re^3: get n lines before a pattern) below. | [reply] [d/l] |
|
I have updated the details of my file. Please see the change . Sorry for the previous error.
| [reply] |
|
Yes, the changes to the file in the OP certainly require an updated approach. What have you tried?
I would loop through the file saving each key and either pushing to a data structure if the name matches or resetting and continuing.
Pseudo code for the loop and structure I'd use:
my @matches;
my $FOUND = 0;
my %info = {};
while (<INFILE>) {
chomp $_;
if (($_ =~ /^id/) and ($FOUND)) {
push @matches \%info;
$FOUND = 0;
%info = {}
}
if ($_ =~ /^id/) { (undef, $info{id}) = split / /, $_}
if ($_ =~ /^address/) { (undef, $info{address}) = split / /, $_}
if ($_ =~ /^name/) { (undef, $info{fname}) = split / /, $_}
...
if ($searchPattern eq $info{fname}) {
$FOUND = 1;
}
}
UPDATE: Added 'chomp' and updated 'split' commands as per kennethk suggestions to me.
| [reply] [d/l] |
Re: get n lines before or after a pattern
by Kenosis (Priest) on Jul 25, 2012 at 17:09 UTC
|
use Modern::Perl;
my $searchFor = 'jack';
local $/ = 'id ';
while (<DATA>) {
next if !/\nname\s+\b$searchFor\b/;
say 'id ', join "\n", ( split "\n" )[ 0, 1, 2, 5 ];
}
__DATA__
start
id 10
address Richmond
name jack
xxxxx
aaaaa
lastname black
yyyy
zzzzz
id 11
address Central
name rick
cccccc
dddddd
lastname hanna
eeeee
yyyyy
id 12
address denver
name jack
sssss
tttttt
lastname strong
rrrrr
mmmmm
id 13
address Virginia
name mick
aaaaaaa
ooooooo
lastname jagger
gggggg
hhhhhh
id 14
address Maine
name rick
sssss
sssss
lastname stewart
ssssss
ffffff
end
Output:
id 10
address Richmond
name jack
lastname black
id 12
address denver
name jack
lastname strong
Hope this helps! | [reply] [d/l] [select] |
|
Reading "records" rather than lines is a nice approach. One minor point, your local is not really local as you have not confined it to a particular scope so it applies from the point it appears until the end of the script.
Rather than the split and array slice, another approach could be to open a file handle against a reference to the record so that you can read it line by line in an inner scope and just print the lines you want. This has the advantage that the record layout can change and it will still work.
I hope this is of interest.
| [reply] [d/l] |
|
This is of interest, and excellent, too, JohnGG!
I was aware that I didn't confine the local $/; to a block, not thinking too much about the code snippet. However, I'll remember--as a best practice--to do so with future local (dynamically scoped) variables. It was good to point this out.
I like your refined/seasoned coding: scoping, reading in a multi-line record, opening a file handle on the record-containing scalar, and then grepping through the lines to display the OP's desired output.
Indeed, this is of interest, very well thought out, and very much appreciated.
Thank you.
| [reply] [d/l] [select] |
Re: get n lines before or after a pattern
by kennethk (Abbot) on Jul 25, 2012 at 15:16 UTC
|
What have you tried? What didn't work? See How do I post a question effectively?.
There are two ways I can think of doing this. Probably the simpler from your perspective would be to iterate over lines in a while loop, and set up some state variables to stash values. Then, when you hit a lastname line, you can test the value of $name (or $hash{name}) to see if it is jack, outputting all relevant information if it is.
The more complex approach would be using regular expressions with the m and g modifiers. This is how I'd do, but tends to be a little more fragile, less obvious for code review and more challenging for the neophyte.
#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.
| [reply] [d/l] [select] |
Re: get n lines before or after a pattern
by zentara (Archbishop) on Jul 25, 2012 at 16:28 UTC
|
Untested, but a useful approach.
#!/usr/bin/perl
use strict;
use warnings;
my @buffer; # a queue data structure
while ( <DATA> ) {
if ( /I sent/ ) {
print @buffer; # 3 lines before
print; # the matching line
print scalar(<DATA>); # 1 line following
last; # all done
}
push @buffer, $_;
shift @buffer if @buffer > 3;
}
__DATA__
this is
the output
from
the command
I sent
to the
command interperter
| [reply] [d/l] |
Re: get n lines before or after a pattern
by xiaoyafeng (Deacon) on Jul 25, 2012 at 17:48 UTC
|
try natatime in List::MoreUtils, maybe it makes your code more elegant? ;)
use List::MoreUtils qw/natatime/;
my @contents = <DATA>;
pop @contents;
shift @contents;
my $it = natatime 8, @contents;
while (my @vals = $it->())
{
print "@vals[0,1,2] \n" if $vals[2] =~ /jack/;
}
__DATA__
start
id 10
address Richmond
name jack
xxxxx
aaaaa
lastname black
yyyy
zzzzz
id 11
address Central
name rick
cccccc
dddddd
lastname hanna
eeeee
yyyyy
id 12
address denver
name jack
sssss
tttttt
lastname strong
rrrrr
mmmmm
id 13
address Virginia
name mick
aaaaaaa
ooooooo
lastname jagger
gggggg
hhhhhh
id 14
address Maine
name rick
sssss
sssss
lastname stewart
ssssss
ffffff
end
The another advantage of this approach compared to other way is you won't lose the rest part of every chunk. you can print any elements of @vals by changing slice.
I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction
| [reply] [d/l] |
|
Nice use of List::MoreUtils qw/natatime/! However, consider using /\bjack\b/, as your current regex also matches "jackson", "jackie", "jacklyn", etc.
| [reply] [d/l] [select] |
|
Nice (and + +), but the regex can go astray:
C:>perl -E "my $word="jackhammer"; if ($word =~ /\bjack\b/) {
say $word;
} else {
say \"No word-boundry-delimited 'jack's' found in $word\";
}"
No word-boundry-delimited 'jack's' found in jackhammer
| [reply] [d/l] |
|
Perhaps I'm missing something, but I wouldn't want to find "jackhammer" if I were searching for "jack" as the first name--as listed in the OP's data set. However, the non-word-boundary regex is perfect for finding all first names containing the sub-string "jack", as $vals[2] =~ /jack/ would.
| [reply] [d/l] |
|
|
Re: get n lines before or after a pattern
by Anonymous Monk on Jul 25, 2012 at 15:46 UTC
|
Search for grep, the Unix command, implementation in Perl. There are at least one such implementations posted around here (don't have the (search) links handy); another was posted long ago in comp.lang.perl.misc newsgroup. Yet another is App::Ack; refer to &print_line_with_context & &get_context subs. | [reply] [d/l] [select] |
Re: get n lines before or after a pattern
by Athanasius (Archbishop) on Jul 26, 2012 at 03:32 UTC
|
#! perl
use strict;
use warnings;
use Tie::File;
my $file = 'test.txt';
tie my @lines, 'Tie::File', $file or die "Cannot tie file '$file': $!"
+;
for my $i (0 .. $#lines)
{
if ($lines[$i] =~ m{ \b jack \b }x)
{
for ($i - 2 .. $i)
{
print $lines[$_], "\n" unless $_ < 0;
}
for (my $found = 0; !$found && $i <= $#lines; ++$i)
{
if ($lines[$i] =~ m{ \b lastname \b }x)
{
print $lines[$i], "\n";
$found = 1;
}
}
}
}
untie @lines;
What is nice about this approach is that, by treating the data file as an ordinary array, it is possible to meet more complicated requirements without the programming overhead of manually maintaining line buffers. So, this approach has the advantage of being scalable. Some notes on Tie::File:
- It’s a core module: Tie::File
- Written by Dominus
- From the docs: “The file is not loaded into memory, so this will work even for gigantic files.”
HTH,
Athanasius <°(((>< contra mundum
| [reply] [d/l] |
Re: get n lines before or after a pattern
by cheekuperl (Monk) on Jul 26, 2012 at 06:33 UTC
|
| [reply] |
Re: get n lines before or after a pattern
by brx (Pilgrim) on Jul 26, 2012 at 17:09 UTC
|
Similar to zentara's approach in Re: get n lines before or after a pattern.
The idea is to keep it short, to be independent of other lines content, to deal with file boundaries (ie to find 'jack' in firsts or lasts lines is OK).
note: the program could print the same line several times if 'jack' is found in consecutive lines - does OP want that?
#!perl
use strict;
use warnings;
my @buffer=("")x6;
my $line;
while (@buffer) {
push @buffer,$line if defined($line=scalar(<DATA>));
shift @buffer;
print @buffer[0,1,2],$buffer[5]//'' if ($buffer[2]//'')=~/\bjack\b
+/;
#match index: ^ ^
}
__DATA__
extra jack
extra
extra
start
id 10
address Richmond
name jack
xxxxx
aaaaa
lastname black
yyyy
zzzzz
id 11
address Central
name rick
cccccc
dddddd
lastname hanna
eeeee
yyyyy
id 12
address denver
name jack
sssss
tttttt
lastname strong
rrrrr
mmmmm
id 13
address Virginia
name mick
aaaaaaa
ooooooo
lastname jagger
gggggg
hhhhhh
id 14
address Maine
name rick
sssss
sssss
lastname stewart
ssssss
ffffff
end
extra
extra jack
English is not my mother tongue.
Les tongues de ma mère sont "made in France".
| [reply] [d/l] |