Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re: alternation in regexes: to use or to avoid?

by dave_the_m (Parson)
on Dec 10, 2012 at 17:08 UTC ( #1008140=note: print w/ replies, xml ) Need Help??


in reply to alternation in regexes: to use or to avoid?

Your problem is that you are including the \s*\w+ and captures within the alternation. Move them outside and you'll find the alternations are suddenly much faster than the loops. This is because alternations containing just fixed strings can be much better optimised (using tries). With the following changes:

my $q1s = join('|', @matchwords); my $q1 = qr/$q1s\s*\w+/; my $q3s = join('|', @matchwords); my $q3 = qr/($q3s)\\s*\\w+/;
and setting the benchmark count 10x larger, I get on 5.17.6:
alternation, grouping: 1 wallclock secs ( 0.58 usr + 0.00 sys = 0.5 +8 CPU) @ 1724137.93/s (n=1000000) alternation, no grouping: 5 wallclock secs ( 4.85 usr + 0.00 sys = +4.85 CPU) @ 206185.57/s (n=1000000) loop, grouping: 24 wallclock secs (23.33 usr + 0.00 sys = 23.33 CPU) +@ 42863.27/s (n=1000000) loop, no grouping: 23 wallclock secs (22.57 usr + 0.00 sys = 22.57 CP +U) @ 44306.60/s (n=1000000)

Dave.


Comment on Re: alternation in regexes: to use or to avoid?
Select or Download Code
Re^2: alternation in regexes: to use or to avoid?
by balker (Novice) on Dec 10, 2012 at 18:42 UTC

    But that doesn't leave room for the various match-words to have different "\w+"'s - e.g. /\bFOO:\s*bar(\d+)/ or /\bBAZ:\s*(\w+)/

    I guess I'm wondering why the trie-optimization isn't used for fixed string prefixes as well (as I'd assumed before I wrote the code).

      Actually I stand corrected: the fixed string prefixes are collected together into a trie where there are (possibly differing) wildcard suffixes; the killer is the individual captures, which disables the trie optimisation.

      So, alternation is the fastest, as long as you put any captures outside the alt.

      Dave.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1008140]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (7)
As of 2014-12-19 02:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (70 votes), past polls