Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^2: Unefficient Regexp 'Matching this or that'

by pelagic (Curate)
on May 19, 2010 at 08:47 UTC ( #840641=note: print w/ replies, xml ) Need Help??


in reply to Re: Unefficient Regexp 'Matching this or that'
in thread Unefficient Regexp 'Matching this or that'

I actually tried it in reverse order also by calling the tests:

01_two_string 02_one_string
And the number of matches is the same, I just stripped this away to make the code smaller.

pelagic


Comment on Re^2: Unefficient Regexp 'Matching this or that'
Download Code
Re^3: Unefficient Regexp 'Matching this or that'
by moritz (Cardinal) on May 19, 2010 at 08:58 UTC
    Another idea: The regex engine has a heavy optimization for looking for literal substrings, using the same algorithm that index uses.

    In the case of one literal that can be used, but I don't think it's used in alternations.

    You can test this hypothesis by comparing regexes of two and three alternatives - if I'm right, then both should be roughly equal in speed, and much slower than a single, literal string.

      Thanks for your hints!
      I did some more testing now:
      one_string      /00901808/
      two_string      /00901808|87654321/
      four_string     /00901808|87654321|12345678|29586741/
      2_grep_string   programmed loop over list of 2 regexps
      4_grep_string   programmed loop over list of 4 regexps
      
      > perl bench_regexp 100000lines.92MB.file Benchmark: timing 1 iterations of 2_grep_string, 4_grep_string, four_s +tring, one_string, two_string... Matched records: 1 2_grep_string: 3 wallclock secs ( 2.91 usr + 0.44 sys = 3.35 CPU) @ + 0.30/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 4_grep_string: 4 wallclock secs ( 3.56 usr + 0.42 sys = 3.98 CPU) @ + 0.25/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 four_string: 100 wallclock secs (98.83 usr + 0.56 sys = 99.39 CPU) @ + 0.01/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 one_string: 2 wallclock secs ( 1.62 usr + 0.40 sys = 2.02 CPU) @ 0 +.50/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 two_string: 75 wallclock secs (73.87 usr + 0.53 sys = 74.40 CPU) @ 0 +.01/s (n=1) (warning: too few iterations for a reliable count)

      pelagic

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://840641]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (8)
As of 2015-07-05 20:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (68 votes), past polls