Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re^3: Unefficient Regexp 'Matching this or that'

by moritz (Cardinal)
on May 19, 2010 at 08:58 UTC ( #840645=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Unefficient Regexp 'Matching this or that'
in thread Unefficient Regexp 'Matching this or that'

Another idea: The regex engine has a heavy optimization for looking for literal substrings, using the same algorithm that index uses.

In the case of one literal that can be used, but I don't think it's used in alternations.

You can test this hypothesis by comparing regexes of two and three alternatives - if I'm right, then both should be roughly equal in speed, and much slower than a single, literal string.


Comment on Re^3: Unefficient Regexp 'Matching this or that'
Replies are listed 'Best First'.
Re^4: Unefficient Regexp 'Matching this or that'
by pelagic (Curate) on May 19, 2010 at 09:12 UTC
    Thanks for your hints!
    I did some more testing now:
    one_string      /00901808/
    two_string      /00901808|87654321/
    four_string     /00901808|87654321|12345678|29586741/
    2_grep_string   programmed loop over list of 2 regexps
    4_grep_string   programmed loop over list of 4 regexps
    
    > perl bench_regexp 100000lines.92MB.file Benchmark: timing 1 iterations of 2_grep_string, 4_grep_string, four_s +tring, one_string, two_string... Matched records: 1 2_grep_string: 3 wallclock secs ( 2.91 usr + 0.44 sys = 3.35 CPU) @ + 0.30/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 4_grep_string: 4 wallclock secs ( 3.56 usr + 0.42 sys = 3.98 CPU) @ + 0.25/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 four_string: 100 wallclock secs (98.83 usr + 0.56 sys = 99.39 CPU) @ + 0.01/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 one_string: 2 wallclock secs ( 1.62 usr + 0.40 sys = 2.02 CPU) @ 0 +.50/s (n=1) (warning: too few iterations for a reliable count) Matched records: 1 two_string: 75 wallclock secs (73.87 usr + 0.53 sys = 74.40 CPU) @ 0 +.01/s (n=1) (warning: too few iterations for a reliable count)

    pelagic

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://840645]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (11)
As of 2015-07-30 15:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (272 votes), past polls