Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^3: Unefficient Regexp 'Matching this or that'

by moritz (Cardinal)
on May 19, 2010 at 08:58 UTC ( #840645=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Unefficient Regexp 'Matching this or that'
in thread Unefficient Regexp 'Matching this or that'

Another idea: The regex engine has a heavy optimization for looking for literal substrings, using the same algorithm that index uses.

In the case of one literal that can be used, but I don't think it's used in alternations.

You can test this hypothesis by comparing regexes of two and three alternatives - if I'm right, then both should be roughly equal in speed, and much slower than a single, literal string.


Comment on Re^3: Unefficient Regexp 'Matching this or that'
Re^4: Unefficient Regexp 'Matching this or that'
by pelagic (Curate) on May 19, 2010 at 09:12 UTC
    Thanks for your hints!
    I did some more testing now:
    one_string      /00901808/
    two_string      /00901808|87654321/
    four_string     /00901808|87654321|12345678|29586741/
    2_grep_string   programmed loop over list of 2 regexps
    4_grep_string   programmed loop over list of 4 regexps
    
    > perl bench_regexp 100000lines.92MB.file Benchmark: timing 1 iterations of 2_grep_string, 4_grep_string, four_s +tring, one_string, two_string... Matched records: 1 2_grep_string: 3 wallclock secs ( 2.91 usr + 0.44 sys = 3.35 CPU) @ + 0.30/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 4_grep_string: 4 wallclock secs ( 3.56 usr + 0.42 sys = 3.98 CPU) @ + 0.25/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 four_string: 100 wallclock secs (98.83 usr + 0.56 sys = 99.39 CPU) @ + 0.01/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 one_string: 2 wallclock secs ( 1.62 usr + 0.40 sys = 2.02 CPU) @ 0 +.50/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 two_string: 75 wallclock secs (73.87 usr + 0.53 sys = 74.40 CPU) @ 0 +.01/s (n=1) (warning: too few iterations for a reliable count)

    pelagic

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://840645]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2014-07-11 02:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (217 votes), past polls