Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^3: More than one pattern match using grep on a file

by kennethk (Monsignor)
on Apr 04, 2014 at 16:17 UTC ( #1081163=note: print w/ replies, xml ) Need Help??


in reply to Re^2: More than one pattern match using grep on a file
in thread More than one pattern match using grep on a file

IMHO, this is actually bad advice. /o can cause some confusing bugs and is generally a case of premature optimization. If I run the benchmark code:

#!/usr/bin/perl use strict; use warnings; use Benchmark 'cmpthese'; local $" = '|'; my $target = join '', map chr(97 + rand 26), 1 .. 100000; my @patterns = map {join '', map chr(97 + rand 26), 1 .. 5 } 1 .. 1 +00; my @res = map qr/$_/, @patterns; my $whole_pat = "@patterns"; my $whole_re = qr/@patterns/; cmpthese(-5, { 'inline' => sub {$target =~ /@patterns/}, 'inline-o' => sub {$target =~ /@patterns/o}, 'grep_str' => sub {return 1 if grep $target =~ $_, @patterns} +, 'grep_RE' => sub {return 1 if grep $target =~ $_, @res}, 'whole_pat' => sub {$target =~ /$whole_pat/}, 'whole_pat-o' => sub {$target =~ /$whole_pat/o}, 'whole_re' => sub {$target =~ $whole_re}, });
two sample outputs I get (unstable, given rand) is
Rate grep_str grep_RE inline inline-o whole_pat-o whole_ +pat whole_re grep_str 96.6/s -- -2% -67% -67% -67% - +67% -67% grep_RE 99.1/s 3% -- -67% -67% -67% - +67% -67% inline 296/s 207% 199% -- -0% -0% +-0% -0% inline-o 296/s 207% 199% 0% -- -0% +-0% -0% whole_pat-o 297/s 207% 199% 0% 0% -- +-0% -0% whole_pat 297/s 207% 200% 0% 0% 0% + -- 0% whole_re 297/s 207% 200% 0% 0% 0% + 0% --
Rate grep_str grep_RE inline inline-o whole_re whole_pat + whole_pat-o grep_str 97.5/s -- -2% -94% -94% -94% -94% + -94% grep_RE 99.8/s 2% -- -94% -94% -94% -94% + -94% inline 1686/s 1628% 1589% -- -0% -1% -1% + -1% inline-o 1688/s 1630% 1591% 0% -- -1% -1% + -1% whole_re 1707/s 1650% 1610% 1% 1% -- -0% + -0% whole_pat 1707/s 1650% 1610% 1% 1% 0% -- + -0% whole_pat-o 1707/s 1650% 1610% 1% 1% 0% 0% + --
The list lengths were chosen so that the likely hood of actually getting a hit is reasonable (~80%). If we increase the pattern lengths to 10 characters so that failure is almost guaranteed, I get the following:
Rate grep_str grep_RE inline inline-o whole_pat-o whole_p +at whole_re grep_str 169/s -- -4% -46% -46% -46% -4 +6% -46% grep_RE 177/s 5% -- -43% -43% -44% -4 +4% -44% inline 312/s 85% 77% -- -0% -0% - +0% -0% inline-o 312/s 85% 77% 0% -- -0% - +0% -0% whole_pat-o 313/s 85% 77% 0% 0% -- - +0% -0% whole_pat 313/s 86% 77% 0% 0% 0% +-- 0% whole_re 313/s 86% 77% 0% 0% 0% +0% --

You get negligible impact from the optimization, and you break your ability to update your array of patterns (potential for bugs). You also potentially confuse people less-sophisticated people who might look at your code. There is a clear improvement over using grep, but if benchmarking shows this step is your bottleneck, then you are probably better off either optimizing your pattern or rethinking your filtering.


#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.


Comment on Re^3: More than one pattern match using grep on a file
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1081163]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (9)
As of 2014-09-23 22:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (241 votes), past polls