Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: alternation in regexes: to use or to avoid?

by RichardK (Priest)
on Dec 10, 2012 at 15:31 UTC ( #1008122=note: print w/ replies, xml ) Need Help??


in reply to alternation in regexes: to use or to avoid?

It's not clear to me what you are trying to achieve with your regex.

The simple grouping that look like this

aol\s*\w+|aachen\s*\w+|aaliyah\s*\w+|.....

runs quickly, it's only the one with lots of capture groups that is slow. i.e

(aol)\s*\w+|(aachen)\s*\w+|(aaliyah)\s*\w+|....

So maybe there's just a better way to get the result you want, if you'd care to explain what that is?


Comment on Re: alternation in regexes: to use or to avoid?
Select or Download Code
Re^2: alternation in regexes: to use or to avoid?
by balker (Novice) on Dec 10, 2012 at 15:51 UTC

    (I work with dk.)

    Another question could be: why is the one with the capture groups so slow, since none of the words match the string?

    And in general, why is alternation&capture so much slower than looping&capture + alternation combined?

    The reason for the code is to replace code with 60 or so similarly structured regexes in a library used by a couple of legacy applications with an automatically generated regex generated with info from configuration files, for both (potential) performance gains, allowing different behaviour across applications, and definite maintainability gains. The strings replaced all have the structure \bFOO:\s*bar(\d+) or \bBAZ:\s*(\w+) etc.

    Suggestions like "Well, don't do that" are likely to go unheard :-)

      OK then, If you want to use a non-optimal solution for operational reasons, go right ahead :-)

Re^2: alternation in regexes: to use or to avoid?
by dk (Chaplain) on Dec 10, 2012 at 16:07 UTC
    Added to balker's response, it's not that we're trying to achieve, we know other means how to get where we want to, but it's about the principle I've long nourished, (see Anastasius's quote above), and now it doesn't hold water. What i'd love to see, an explanation of someone who knows why regex algorithm exhibits behavior that is CONTRARY to perl lore.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1008122]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (10)
As of 2014-09-02 17:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (25 votes), past polls