|Problems? Is your data what you think it is?|
This regex seems to have splattered non-greedy everywhereby fizbin (Chaplain)
|on Aug 10, 2005 at 17:30 UTC||Need Help??|
fizbin has asked for the
wisdom of the Perl Monks concerning the following question:
So I was trying to debug a problem I was having with a regular expression in that other language I use professionally and thought "well, let me see if perl's regular expression engine behaves the way I think it should on this". So then I copy it over and, while perl doesn't have the majorly unpleasant O(2**n) time behavior I was seeing with java.util.regex, it does misbehave in a bizarre fashion. Or at least, I think it does. I've narrowed down the test case as much as I can to still see the strange behavior.
The paired-down code attempts to split incoming lines on XX, but only when XX isn't inside a single-quoted string:
Now here's where it gets weird. The trailing fields are coming out one character at a time. Here's the output:
It's as though perl had decided that my last + sign in the regular expression should be non-greedy despite the fact that it's not followed by a ?.
What's going on here? It gets even more bizarre:
If I replace the first half with a pattern that really should be equivalent, this behavior goes away:
To summarize: I have a regular expression match that is of the form m%(foo)XX|(.+)%g, where foo is a slightly complicated expression with no captures. When I run it, I get single character results repeatedly in $2. When I replace foo with a different complicated expression that should be equivalent, I suddenly get multiple characters in $2.
I've verified this behavior with cygwin's 5.8.6 perl and ActiveState's 5.6.1. (build 635)