Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Capture group in Regex.

by tty1x (Novice)
on May 04, 2013 at 03:16 UTC ( [id://1031978]=perlquestion: print w/replies, xml ) Need Help??

tty1x has asked for the wisdom of the Perl Monks concerning the following question:

$_ = "yabba dabba doo"; if (/y(.)(.)\2\1/) { # matches 'abba' print "It matched after the y!\n"; }
I saw the above code in a Perl book and I am trying to understand it.
From what I understand, \1 matches any character except \n after character y , and then \2 matches any character except \n after character y and the character \1 has matched.
This will result in a match of 'yab' for the regex to be true. But my logic is flawed.
Any guidance is appreciated :)

Replies are listed 'Best First'.
Re: Capture group in Regex.
by Athanasius (Archbishop) on May 04, 2013 at 03:34 UTC

    Within a regex, parentheses make a capture. Captures are numbered 1, 2, ..., in the same order as they occur within the regex, and, within the regex, \1, \2, etc. correspond to the capture groups captured first, second, etc. (Outside the regex, the captures are named $1, $2, etc.)

    The regex /y(.)(.)\2\1/ matches any 5-character sequence having the following form: a literal “y”, followed by any two non-newline characters, followed by the same character as was matched in the second capture group, followed by the same character as was matched in the first capture group.

    So, “abba” does not match, because it doesn’t contain a literal “y”. But “yabba” matches, as do “yukku”, “ywzzw”, “y1@@1”, etc.

    See the section “Capture groups” in perlre#Regular-Expressions.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thanks for the clarification :)
Re: Capture group in Regex.
by ww (Archbishop) on May 04, 2013 at 11:32 UTC
    "From what I understand, \1 matches any character except \n after ...."

    Athanasius response is a good one, but, IMO, invites an alternate way to address a key element of your misconception. You can think of \1 and $1 as containers used to (temporarily) retain the matches found by the wildcards .

    But \1 and $1 do NOT match anything, per se.

    Explaining "temporarily is beyond the scope of this thread; just take care to assign the value in $1 (or any other of its ilk) to an ordinary, named variable (without $1's idiosyncrasies) as promptly as possible.


    If you didn't program your executable by toggling in binary, it wasn't really programming!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1031978]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2024-03-28 19:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found