Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^2: Unefficient Regexp 'Matching this or that'

by pelagic (Curate)
on May 19, 2010 at 08:47 UTC ( #840641=note: print w/ replies, xml ) Need Help??


in reply to Re: Unefficient Regexp 'Matching this or that'
in thread Unefficient Regexp 'Matching this or that'

I actually tried it in reverse order also by calling the tests:

01_two_string 02_one_string
And the number of matches is the same, I just stripped this away to make the code smaller.

pelagic


Comment on Re^2: Unefficient Regexp 'Matching this or that'
Download Code
Re^3: Unefficient Regexp 'Matching this or that'
by moritz (Cardinal) on May 19, 2010 at 08:58 UTC
    Another idea: The regex engine has a heavy optimization for looking for literal substrings, using the same algorithm that index uses.

    In the case of one literal that can be used, but I don't think it's used in alternations.

    You can test this hypothesis by comparing regexes of two and three alternatives - if I'm right, then both should be roughly equal in speed, and much slower than a single, literal string.

      Thanks for your hints!
      I did some more testing now:
      one_string      /00901808/
      two_string      /00901808|87654321/
      four_string     /00901808|87654321|12345678|29586741/
      2_grep_string   programmed loop over list of 2 regexps
      4_grep_string   programmed loop over list of 4 regexps
      
      > perl bench_regexp 100000lines.92MB.file Benchmark: timing 1 iterations of 2_grep_string, 4_grep_string, four_s +tring, one_string, two_string... Matched records: 1 2_grep_string: 3 wallclock secs ( 2.91 usr + 0.44 sys = 3.35 CPU) @ + 0.30/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 4_grep_string: 4 wallclock secs ( 3.56 usr + 0.42 sys = 3.98 CPU) @ + 0.25/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 four_string: 100 wallclock secs (98.83 usr + 0.56 sys = 99.39 CPU) @ + 0.01/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 one_string: 2 wallclock secs ( 1.62 usr + 0.40 sys = 2.02 CPU) @ 0 +.50/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 two_string: 75 wallclock secs (73.87 usr + 0.53 sys = 74.40 CPU) @ 0 +.01/s (n=1) (warning: too few iterations for a reliable count)

      pelagic

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://840641]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (12)
As of 2014-08-22 12:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (156 votes), past polls