Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

get n lines before or after a pattern

by darklord_999 (Acolyte)
on Jul 25, 2012 at 14:39 UTC ( #983672=perlquestion: print w/ replies, xml ) Need Help??
darklord_999 has asked for the wisdom of the Perl Monks concerning the following question:

I have a file test.txt with the following information

start id 10 address Richmond name jack xxxxx aaaaa lastname black yyyy zzzzz id 11 address Central name rick cccccc dddddd lastname hanna eeeee yyyyy id 12 address denver name jack sssss tttttt lastname strong rrrrr mmmmm id 13 address Virginia name mick aaaaaaa ooooooo lastname jagger gggggg hhhhhh id 14 address Maine name rick sssss sssss lastname stewart ssssss ffffff end

Now I want to search for a particular first name , say jack, and it should return the following information:

id 10 address Richmond name jack lastname black id 12 address denver name jack lastname strong

How can i get this information -- this is getting the previous two lines from the actual pattern and also some information a particular number of lines after the patten ?

Comment on get n lines before or after a pattern
Select or Download Code
Re: get n lines before a pattern
by VinsWorldcom (Priest) on Jul 25, 2012 at 14:43 UTC

    Note you're output is not only showing 2 lines before the pattern, but also 1 line AFTER the pattern.

    You don't need Perl for something that simple:

    grep -B2 -A1 jack test.txt

    UPDATE: Since the OP updated the original question, this approach is no longer valid. See my reply (Re^3: get n lines before a pattern) below.

      I have updated the details of my file. Please see the change . Sorry for the previous error.

        Yes, the changes to the file in the OP certainly require an updated approach. What have you tried?

        I would loop through the file saving each key and either pushing to a data structure if the name matches or resetting and continuing.

        Pseudo code for the loop and structure I'd use:

        my @matches; my $FOUND = 0; my %info = {}; while (<INFILE>) { chomp $_; if (($_ =~ /^id/) and ($FOUND)) { push @matches \%info; $FOUND = 0; %info = {} } if ($_ =~ /^id/) { (undef, $info{id}) = split / /, $_} if ($_ =~ /^address/) { (undef, $info{address}) = split / /, $_} if ($_ =~ /^name/) { (undef, $info{fname}) = split / /, $_} ... if ($searchPattern eq $info{fname}) { $FOUND = 1; } }

        UPDATE: Added 'chomp' and updated 'split' commands as per kennethk suggestions to me.

Re: get n lines before or after a pattern
by kennethk (Monsignor) on Jul 25, 2012 at 15:16 UTC
    What have you tried? What didn't work? See How do I post a question effectively?.

    There are two ways I can think of doing this. Probably the simpler from your perspective would be to iterate over lines in a while loop, and set up some state variables to stash values. Then, when you hit a lastname line, you can test the value of $name (or $hash{name}) to see if it is jack, outputting all relevant information if it is.

    The more complex approach would be using regular expressions with the m and g modifiers. This is how I'd do, but tends to be a little more fragile, less obvious for code review and more challenging for the neophyte.


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: get n lines before or after a pattern
by Anonymous Monk on Jul 25, 2012 at 15:46 UTC
    Search for grep, the Unix command, implementation in Perl. There are at least one such implementations posted around here (don't have the (search) links handy); another was posted long ago in comp.lang.perl.misc newsgroup. Yet another is App::Ack; refer to &print_line_with_context & &get_context subs.
Re: get n lines before or after a pattern
by zentara (Archbishop) on Jul 25, 2012 at 16:28 UTC
    Untested, but a useful approach.
    #!/usr/bin/perl use strict; use warnings; my @buffer; # a queue data structure while ( <DATA> ) { if ( /I sent/ ) { print @buffer; # 3 lines before print; # the matching line print scalar(<DATA>); # 1 line following last; # all done } push @buffer, $_; shift @buffer if @buffer > 3; } __DATA__ this is the output from the command I sent to the command interperter

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
Re: get n lines before or after a pattern
by davido (Archbishop) on Jul 25, 2012 at 16:33 UTC

    When you hear yourself saying "I need to know what comes n lines before XYZ", you should be thinking "I need to stash n previous lines while I iterate through the file." When you hear yourself saying, "I need to know what comes after XYZ until PDQ is found.", you should be thinking of how to identify state (ie, how to keep track of having found the trigger). You can keep track of state with a variable, or you can do it by flowing into a different branch of code. This snippet accomplishes your goal by stashing two lines at all times (clearing them only after XYZ is found), and by flowing into a different branch when XYZ has been found, until PDQ shows up.

    As I mentioned above, this is one of several common ways of dealing with state.

    use strict; use warnings; my $find = 'jack'; my $trigger_re = qr{^name\s+$find\b}; my $finally_re = qr(^lastname\s+\p{Alpha}+\b); my @stash; while( my $line = <DATA> ) { chomp $line; if( $line =~ $trigger_re ) { print "$_\n" for @stash; @stash = (); print $line, "\n"; while ( my $next = <DATA> ) { if( $next =~ $finally_re ) { print $next; last; } } } else { push @stash, $line; while( @stash > 2 ) { shift @stash; } } } __DATA__ start id 10 address Richmond name jack xxxxx aaaaa lastname black yyyy zzzzz id 11 address Central name rick cccccc dddddd lastname hanna eeeee yyyyy id 12 address denver name jack sssss tttttt lastname strong rrrrr mmmmm id 13 address Virginia name mick aaaaaaa ooooooo lastname jagger gggggg hhhhhh id 14 address Maine name rick sssss sssss lastname stewart ssssss ffffff end

    The output is...

    id 10 address Richmond name jack lastname black id 12 address denver name jack lastname strong

    If the stash hasn't received two lines ahead of "name jack", it will quietly just print however many it accumulated (max 2). If the "lastname" never shows up, it will quietly flow through the end of the file. This may not be what you want; it's possible that you'll want to just carp about a malformed record the moment the next "name" shows up. That's pretty easy to implement, so I'll leave it to you if you find it advantageous. Similarly, it's a simple check to verify that two lines are stored in @stash prior to printing, and it would be easy to carp a warning about a malformed record there as well.

    I build the regexes outside of the loop just to keep the loop code as simple (and general) as possible. This has the added efficiency benefit of assuring that the regex that contains variable interpolation will only be compiled once rather than each time through the loop.


    Dave

Re: get n lines before or after a pattern
by Kenosis (Priest) on Jul 25, 2012 at 17:09 UTC

    Here's another option:

    use Modern::Perl; my $searchFor = 'jack'; local $/ = 'id '; while (<DATA>) { next if !/\nname\s+\b$searchFor\b/; say 'id ', join "\n", ( split "\n" )[ 0, 1, 2, 5 ]; } __DATA__ start id 10 address Richmond name jack xxxxx aaaaa lastname black yyyy zzzzz id 11 address Central name rick cccccc dddddd lastname hanna eeeee yyyyy id 12 address denver name jack sssss tttttt lastname strong rrrrr mmmmm id 13 address Virginia name mick aaaaaaa ooooooo lastname jagger gggggg hhhhhh id 14 address Maine name rick sssss sssss lastname stewart ssssss ffffff end

    Output:

    id 10 address Richmond name jack lastname black id 12 address denver name jack lastname strong

    Hope this helps!

      Reading "records" rather than lines is a nice approach. One minor point, your local is not really local as you have not confined it to a particular scope so it applies from the point it appears until the end of the script.

      Rather than the split and array slice, another approach could be to open a file handle against a reference to the record so that you can read it line by line in an inner scope and just print the lines you want. This has the advantage that the record layout can change and it will still work.

      I hope this is of interest.

      Cheers,

      JohnGG

        This is of interest, and excellent, too, JohnGG!

        I was aware that I didn't confine the local $/; to a block, not thinking too much about the code snippet. However, I'll remember--as a best practice--to do so with future local (dynamically scoped) variables. It was good to point this out.

        I like your refined/seasoned coding: scoping, reading in a multi-line record, opening a file handle on the record-containing scalar, and then grepping through the lines to display the OP's desired output.

        Indeed, this is of interest, very well thought out, and very much appreciated.

        Thank you.

Re: get n lines before or after a pattern
by xiaoyafeng (Chaplain) on Jul 25, 2012 at 17:48 UTC
    try natatime in List::MoreUtils, maybe it makes your code more elegant? ;)
    use List::MoreUtils qw/natatime/; my @contents = <DATA>; pop @contents; shift @contents; my $it = natatime 8, @contents; while (my @vals = $it->()) { print "@vals[0,1,2] \n" if $vals[2] =~ /jack/; } __DATA__ start id 10 address Richmond name jack xxxxx aaaaa lastname black yyyy zzzzz id 11 address Central name rick cccccc dddddd lastname hanna eeeee yyyyy id 12 address denver name jack sssss tttttt lastname strong rrrrr mmmmm id 13 address Virginia name mick aaaaaaa ooooooo lastname jagger gggggg hhhhhh id 14 address Maine name rick sssss sssss lastname stewart ssssss ffffff end
    The another advantage of this approach compared to other way is you won't lose the rest part of every chunk. you can print any elements of @vals by changing slice.




    I am trying to improve my English skills, if you see a mistake please feel free to reply or /msg me a correction

      Nice use of List::MoreUtils qw/natatime/! However, consider using /\bjack\b/, as your current regex also matches "jackson", "jackie", "jacklyn", etc.

      Nice (and + +), but the regex can go astray:
      C:>perl -E "my $word="jackhammer"; if ($word =~ /\bjack\b/) { say $word; } else { say \"No word-boundry-delimited 'jack's' found in $word\"; }" No word-boundry-delimited 'jack's' found in jackhammer

        Perhaps I'm missing something, but I wouldn't want to find "jackhammer" if I were searching for "jack" as the first name--as listed in the OP's data set. However, the non-word-boundary regex is perfect for finding all first names containing the sub-string "jack", as $vals[2] =~ /jack/ would.

Re: get n lines before or after a pattern
by Athanasius (Monsignor) on Jul 26, 2012 at 03:32 UTC

    Here is another approach, using Tie::File:

    #! perl use strict; use warnings; use Tie::File; my $file = 'test.txt'; tie my @lines, 'Tie::File', $file or die "Cannot tie file '$file': $!" +; for my $i (0 .. $#lines) { if ($lines[$i] =~ m{ \b jack \b }x) { for ($i - 2 .. $i) { print $lines[$_], "\n" unless $_ < 0; } for (my $found = 0; !$found && $i <= $#lines; ++$i) { if ($lines[$i] =~ m{ \b lastname \b }x) { print $lines[$i], "\n"; $found = 1; } } } } untie @lines;

    What is nice about this approach is that, by treating the data file as an ordinary array, it is possible to meet more complicated requirements without the programming overhead of manually maintaining line buffers. So, this approach has the advantage of being scalable. Some notes on Tie::File:

    • It’s a core module: Tie::File
    • Written by Dominus
    • From the docs: “The file is not loaded into memory, so this will work even for gigantic files.”

    HTH,

    Athanasius <°(((><contra mundum

Re: get n lines before or after a pattern
by cheekuperl (Monk) on Jul 26, 2012 at 06:33 UTC
Re: get n lines before or after a pattern
by brx (Pilgrim) on Jul 26, 2012 at 17:09 UTC

    Similar to zentara's approach in Re: get n lines before or after a pattern.
    The idea is to keep it short, to be independent of other lines content, to deal with file boundaries (ie to find 'jack' in firsts or lasts lines is OK).

    note: the program could print the same line several times if 'jack' is found in consecutive lines - does OP want that?

    #!perl use strict; use warnings; my @buffer=("")x6; my $line; while (@buffer) { push @buffer,$line if defined($line=scalar(<DATA>)); shift @buffer; print @buffer[0,1,2],$buffer[5]//'' if ($buffer[2]//'')=~/\bjack\b +/; #match index: ^ ^ } __DATA__ extra jack extra extra start id 10 address Richmond name jack xxxxx aaaaa lastname black yyyy zzzzz id 11 address Central name rick cccccc dddddd lastname hanna eeeee yyyyy id 12 address denver name jack sssss tttttt lastname strong rrrrr mmmmm id 13 address Virginia name mick aaaaaaa ooooooo lastname jagger gggggg hhhhhh id 14 address Maine name rick sssss sssss lastname stewart ssssss ffffff end extra extra jack
    English is not my mother tongue.
    Les tongues de ma mère sont "made in France".

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://983672]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (11)
As of 2014-07-25 18:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (174 votes), past polls