Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Summary: "(:?" is a quiet guy, but not as well-mannered and quick as that "(?:" fellow.

When extending the regex syntax to include features like zero-width negative look-ahead the authors tried very hard to use syntax that avoided duplicating any 'real' regex code. So they started all the new syntax with '(?'. It turns out that this makes typos a bit too easy, and far too quiet.

I came across the following in a CPAN module:

^(:?(:?\(\d\d\d\))?\s*\d\d)?\d[-.\s]?\d\d\d\d$
It isn't important what the RE does as much as 1) it doesn't work as intended, and 2) it doesn't (loudly) fail

The writer intended to use "(?:", the clustering grouping. This is used when you need to avoid capturing the matched subexpression. For instance you might want to say that a complex inner match is optional, e.g.

... ( contains \s+ (?:this|that)? \s+ item ) ...

But tyops happen. What is the result if you reverse the ':' and '?' characters? Nothing drastic, usually.

In "(:? pattern )" the original meaning of '?' is used - the ':' character becomes an optionally matched character. The parentheses also revert to their original meaning of capturing groups.

So usually the only result is that the regex is a bit slower and captures more substrings. It might also allow a stray ':' input character. If you weren't monitoring how many captures come back from a successful match you might never notice the typo.

But note that this typo could occur with any single character "(?X" syntax. You might notice it right away if your "(#? comment )" caused syntax errors. And you should notice it when your input matching tests fail on "fore(=?fend)". But otherwise these typos will silently fail.

Now this is a minor gotcha. Except that it is found in 15 nodes here, with another node mentioning it in an aside, and another node discovering the typo in a book. I wonder if it is in your code?

perlre - Extended Patterns


In reply to Re: Common Regex Gotchas -- "(:?" by shenme
in thread Common Regex Gotchas by chromatic

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others taking refuge in the Monastery: (8)
    As of 2014-12-27 07:51 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (176 votes), past polls