Re^2: Regex code block executes twice per match using look-arounds

Just a minor pick: perl (ir)regular expressions are usually put in the NFA category meaning essentially some pathological (and not-so pathological) patterns can run for a long time (like double loops: '(blah.*)+'). This is a trade-off for getting more useful features.

The Owl book (Mastering Regular Expressions) talks about 3 categories: DFA, NFA, POSIX NFA. In an alternation pattern, perl uses the first alternation to match, without checking for a possible longer match, making it non POSIX.

There was an interesting discussion in p5p a few months ago motivated by the following article regex matching can be simple and fast but... One conclusion was that using some hybrid DFA/NFA scheme like the one used by gawk which uses GNU regex library could be nice, when you want to garantee a decent running time and are not using features that force the NFA engine to kick in. You'll get the best of all worlds with 5.10 that allows pluggable regex libraries :)

cheers --stephan Note: the compiled form of a regex is usually quite different in both schemes, furthermore the mathematical equivalence between NFA and DFA possibly useful for simple models of regex (few operators) is not used.

Comment on Re^2: Regex code block executes twice per match using look-arounds


Syntactic Confectionery Delight
	PerlMonks