Hi Monks,

I need of some suggestion to grep more than one pattern on a file content. I have set of patterns in a array, if any of the pattern matches I need to return that line.

I tried the following code it works fine.

use strict; use warnings; use Data::Dumper; my @PatternList= qw(index: start:); open(FH, "<", "test.txt") or die; my $line; my @Matches; while($line=<FH>) { my $pattern; foreach $pattern (@PatternList) { if ($line =~ /$pattern/) { print "$line"; } } }

Is there any efficient way to do this? because I have more files to match the patterns

I tried the same with grep, as following but I am getting only the pattern, but not the matched lines, Do I missing something here with grep?

my @Matches = grep { /$_/, $line } @arr; #get the pattern not the matc +hed line



Re: More than one pattern match using grep on a file
by choroba (Cardinal) on Apr 04, 2014 at 13:31 UTC
    Instead of looping over the patterns, create one large pattern by
    my $large_pattern = join '|', @PatternList;

    To grep an array for matching members, just do

    my @matches = grep /$large_pattern/, @array;
      For very simple patterns, you can squeeze out some extra speed by using Regexp::Trie. If the patterns are more complicated Regexp::Assemble or Regexp::Optimizer are something to look into.


      It's probably in addition an idea to add a /o modifier to the regexp for speed optimization (will lead to compiling the regexp just once, what is better for large @arrays or @pattern_lists).


        IMHO, this is actually bad advice. /o can cause some confusing bugs and is generally a case of premature optimization. If I run the benchmark code:
        #!/usr/bin/perl use strict; use warnings; use Benchmark 'cmpthese'; local $" = '|'; my $target = join '', map chr(97 + rand 26), 1 .. 100000; my @patterns = map {join '', map chr(97 + rand 26), 1 .. 5 } 1 .. 1 +00; my @res = map qr/$_/, @patterns; my $whole_pat = "@patterns"; my $whole_re = qr/@patterns/; cmpthese(-5, { 'inline' => sub {$target =~ /@patterns/}, 'inline-o' => sub {$target =~ /@patterns/o}, 'grep_str' => sub {return 1 if grep $target =~ $_, @patterns} +, 'grep_RE' => sub {return 1 if grep $target =~ $_, @res}, 'whole_pat' => sub {$target =~ /$whole_pat/}, 'whole_pat-o' => sub {$target =~ /$whole_pat/o}, 'whole_re' => sub {$target =~ $whole_re}, });
        two sample outputs I get (unstable, given rand) is
        Rate grep_str grep_RE inline inline-o whole_pat-o whole_ +pat whole_re grep_str 96.6/s -- -2% -67% -67% -67% - +67% -67% grep_RE 99.1/s 3% -- -67% -67% -67% - +67% -67% inline 296/s 207% 199% -- -0% -0% +-0% -0% inline-o 296/s 207% 199% 0% -- -0% +-0% -0% whole_pat-o 297/s 207% 199% 0% 0% -- +-0% -0% whole_pat 297/s 207% 200% 0% 0% 0% + -- 0% whole_re 297/s 207% 200% 0% 0% 0% + 0% --
        Rate grep_str grep_RE inline inline-o whole_re whole_pat + whole_pat-o grep_str 97.5/s -- -2% -94% -94% -94% -94% + -94% grep_RE 99.8/s 2% -- -94% -94% -94% -94% + -94% inline 1686/s 1628% 1589% -- -0% -1% -1% + -1% inline-o 1688/s 1630% 1591% 0% -- -1% -1% + -1% whole_re 1707/s 1650% 1610% 1% 1% -- -0% + -0% whole_pat 1707/s 1650% 1610% 1% 1% 0% -- + -0% whole_pat-o 1707/s 1650% 1610% 1% 1% 0% 0% + --
        The list lengths were chosen so that the likely hood of actually getting a hit is reasonable (~80%). If we increase the pattern lengths to 10 characters so that failure is almost guaranteed, I get the following:
        Rate grep_str grep_RE inline inline-o whole_pat-o whole_p +at whole_re grep_str 169/s -- -4% -46% -46% -46% -4 +6% -46% grep_RE 177/s 5% -- -43% -43% -44% -4 +4% -44% inline 312/s 85% 77% -- -0% -0% - +0% -0% inline-o 312/s 85% 77% 0% -- -0% - +0% -0% whole_pat-o 313/s 85% 77% 0% 0% -- - +0% -0% whole_pat 313/s 86% 77% 0% 0% 0% +-- 0% whole_re 313/s 86% 77% 0% 0% 0% +0% --

        You get negligible impact from the optimization, and you break your ability to update your array of patterns (potential for bugs). You also potentially confuse people less-sophisticated people who might look at your code. There is a clear improvement over using grep, but if benchmarking shows this step is your bottleneck, then you are probably better off either optimizing your pattern or rethinking your filtering.

Re: More than one pattern match using grep on a file
by karlgoethebier (Abbot) on Apr 04, 2014 at 16:25 UTC
    my $large_pattern = join '|', @PatternList;

    Simple and just cool! But why not $large_pattern = qx($large_pattern); $large_pattern = qr($large_pattern);?

    Update: Fixed annoying typo.

    Best regards, Karl

      You probably mean qr instead of qx. It is a possible next step.
        "You probably mean qr..."

        D'oh!. Yes. Holy s**t! I fixed it.

        Regards, Karl

