Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

More than one pattern match using grep on a file

by vinoth.ree (Parson)
on Apr 04, 2014 at 13:22 UTC ( #1081129=perlquestion: print w/ replies, xml ) Need Help??
vinoth.ree has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I need of some suggestion to grep more than one pattern on a file content. I have set of patterns in a array, if any of the pattern matches I need to return that line.

I tried the following code it works fine.

use strict; use warnings; use Data::Dumper; my @PatternList= qw(index: start:); open(FH, "<", "test.txt") or die; my $line; my @Matches; while($line=<FH>) { my $pattern; foreach $pattern (@PatternList) { if ($line =~ /$pattern/) { print "$line"; } } }

Is there any efficient way to do this? because I have more files to match the patterns

I tried the same with grep, as following but I am getting only the pattern, but not the matched lines, Do I missing something here with grep?

my @Matches = grep { /$_/, $line } @arr; #get the pattern not the matc +hed line

All is well

Comment on More than one pattern match using grep on a file
Select or Download Code
Re: More than one pattern match using grep on a file
by choroba (Abbot) on Apr 04, 2014 at 13:31 UTC
    Instead of looping over the patterns, create one large pattern by
    my $large_pattern = join '|', @PatternList;

    To grep an array for matching members, just do

    my @matches = grep /$large_pattern/, @array;
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      It's probably in addition an idea to add a /o modifier to the regexp for speed optimization (will lead to compiling the regexp just once, what is better for large @arrays or @pattern_lists).

      Greetings,
      Janek

        IMHO, this is actually bad advice. /o can cause some confusing bugs and is generally a case of premature optimization. If I run the benchmark code:
        #!/usr/bin/perl use strict; use warnings; use Benchmark 'cmpthese'; local $" = '|'; my $target = join '', map chr(97 + rand 26), 1 .. 100000; my @patterns = map {join '', map chr(97 + rand 26), 1 .. 5 } 1 .. 1 +00; my @res = map qr/$_/, @patterns; my $whole_pat = "@patterns"; my $whole_re = qr/@patterns/; cmpthese(-5, { 'inline' => sub {$target =~ /@patterns/}, 'inline-o' => sub {$target =~ /@patterns/o}, 'grep_str' => sub {return 1 if grep $target =~ $_, @patterns} +, 'grep_RE' => sub {return 1 if grep $target =~ $_, @res}, 'whole_pat' => sub {$target =~ /$whole_pat/}, 'whole_pat-o' => sub {$target =~ /$whole_pat/o}, 'whole_re' => sub {$target =~ $whole_re}, });
        two sample outputs I get (unstable, given rand) is
        Rate grep_str grep_RE inline inline-o whole_pat-o whole_ +pat whole_re grep_str 96.6/s -- -2% -67% -67% -67% - +67% -67% grep_RE 99.1/s 3% -- -67% -67% -67% - +67% -67% inline 296/s 207% 199% -- -0% -0% +-0% -0% inline-o 296/s 207% 199% 0% -- -0% +-0% -0% whole_pat-o 297/s 207% 199% 0% 0% -- +-0% -0% whole_pat 297/s 207% 200% 0% 0% 0% + -- 0% whole_re 297/s 207% 200% 0% 0% 0% + 0% --
        Rate grep_str grep_RE inline inline-o whole_re whole_pat + whole_pat-o grep_str 97.5/s -- -2% -94% -94% -94% -94% + -94% grep_RE 99.8/s 2% -- -94% -94% -94% -94% + -94% inline 1686/s 1628% 1589% -- -0% -1% -1% + -1% inline-o 1688/s 1630% 1591% 0% -- -1% -1% + -1% whole_re 1707/s 1650% 1610% 1% 1% -- -0% + -0% whole_pat 1707/s 1650% 1610% 1% 1% 0% -- + -0% whole_pat-o 1707/s 1650% 1610% 1% 1% 0% 0% + --
        The list lengths were chosen so that the likely hood of actually getting a hit is reasonable (~80%). If we increase the pattern lengths to 10 characters so that failure is almost guaranteed, I get the following:
        Rate grep_str grep_RE inline inline-o whole_pat-o whole_p +at whole_re grep_str 169/s -- -4% -46% -46% -46% -4 +6% -46% grep_RE 177/s 5% -- -43% -43% -44% -4 +4% -44% inline 312/s 85% 77% -- -0% -0% - +0% -0% inline-o 312/s 85% 77% 0% -- -0% - +0% -0% whole_pat-o 313/s 85% 77% 0% 0% -- - +0% -0% whole_pat 313/s 86% 77% 0% 0% 0% +-- 0% whole_re 313/s 86% 77% 0% 0% 0% +0% --

        You get negligible impact from the optimization, and you break your ability to update your array of patterns (potential for bugs). You also potentially confuse people less-sophisticated people who might look at your code. There is a clear improvement over using grep, but if benchmarking shows this step is your bottleneck, then you are probably better off either optimizing your pattern or rethinking your filtering.


        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

      For very simple patterns, you can squeeze out some extra speed by using Regexp::Trie. If the patterns are more complicated Regexp::Assemble or Regexp::Optimizer are something to look into.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      My blog: Imperial Deltronics
Re: More than one pattern match using grep on a file
by karlgoethebier (Curate) on Apr 04, 2014 at 16:25 UTC
    my $large_pattern = join '|', @PatternList;

    Simple and just cool! But why not $large_pattern = qx($large_pattern); $large_pattern = qr($large_pattern);?

    Update: Fixed annoying typo.

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      You probably mean qr instead of qx. It is a possible next step.
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        "You probably mean qr..."

        D'oh!. Yes. Holy s**t! I fixed it.

        Regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1081129]
Approved by mtmcc
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2014-07-25 09:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (170 votes), past polls