http://www.perlmonks.org?node_id=840641


in reply to Re: Unefficient Regexp 'Matching this or that'
in thread Unefficient Regexp 'Matching this or that'

I actually tried it in reverse order also by calling the tests:
01_two_string 02_one_string
And the number of matches is the same, I just stripped this away to make the code smaller.

pelagic

Replies are listed 'Best First'.
Re^3: Unefficient Regexp 'Matching this or that'
by moritz (Cardinal) on May 19, 2010 at 08:58 UTC
    Another idea: The regex engine has a heavy optimization for looking for literal substrings, using the same algorithm that index uses.

    In the case of one literal that can be used, but I don't think it's used in alternations.

    You can test this hypothesis by comparing regexes of two and three alternatives - if I'm right, then both should be roughly equal in speed, and much slower than a single, literal string.

      Thanks for your hints!
      I did some more testing now:
      one_string      /00901808/
      two_string      /00901808|87654321/
      four_string     /00901808|87654321|12345678|29586741/
      2_grep_string   programmed loop over list of 2 regexps
      4_grep_string   programmed loop over list of 4 regexps
      
      > perl bench_regexp 100000lines.92MB.file Benchmark: timing 1 iterations of 2_grep_string, 4_grep_string, four_s +tring, one_string, two_string... Matched records: 1 2_grep_string: 3 wallclock secs ( 2.91 usr + 0.44 sys = 3.35 CPU) @ + 0.30/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 4_grep_string: 4 wallclock secs ( 3.56 usr + 0.42 sys = 3.98 CPU) @ + 0.25/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 four_string: 100 wallclock secs (98.83 usr + 0.56 sys = 99.39 CPU) @ + 0.01/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 one_string: 2 wallclock secs ( 1.62 usr + 0.40 sys = 2.02 CPU) @ 0 +.50/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 two_string: 75 wallclock secs (73.87 usr + 0.53 sys = 74.40 CPU) @ 0 +.01/s (n=1) (warning: too few iterations for a reliable count)

      pelagic