|Problems? Is your data what you think it is?|
Re: Common Regex Gotchas -- "(:?"by shenme (Priest)
|on Sep 29, 2005 at 18:28 UTC||Need Help??|
Summary: "(:?" is a quiet guy, but not as well-mannered and quick as that "(?:" fellow.
When extending the regex syntax to include features like zero-width negative look-ahead the authors tried very hard to use syntax that avoided duplicating any 'real' regex code. So they started all the new syntax with '(?'. It turns out that this makes typos a bit too easy, and far too quiet.
I came across the following in a CPAN module:
^(:?(:?\(\d\d\d\))?\s*\d\d)?\d[-.\s]?\d\d\d\d$It isn't important what the RE does as much as 1) it doesn't work as intended, and 2) it doesn't (loudly) fail
The writer intended to use "(?:", the clustering grouping. This is used when you need to avoid capturing the matched subexpression. For instance you might want to say that a complex inner match is optional, e.g.
... ( contains \s+ (?:this|that)? \s+ item ) ...
But tyops happen. What is the result if you reverse the ':' and '?' characters? Nothing drastic, usually.
In "(:? pattern )" the original meaning of '?' is used - the ':' character becomes an optionally matched character. The parentheses also revert to their original meaning of capturing groups.
So usually the only result is that the regex is a bit slower and captures more substrings. It might also allow a stray ':' input character. If you weren't monitoring how many captures come back from a successful match you might never notice the typo.
But note that this typo could occur with any single character "(?X" syntax. You might notice it right away if your "(#? comment )" caused syntax errors. And you should notice it when your input matching tests fail on "fore(=?fend)". But otherwise these typos will silently fail.