Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Regex from Learning Perl

by Anonymous Monk
on Feb 17, 2012 at 20:09 UTC ( [id://954611]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm reading through Learning Perl and stuck looking at one example over and over. Can someone tell me how this regex works?
$_ = "yabba dabba doo"; if (/y(.)(.)\2\1/) { # matches 'abba' print "It matched a after the y"; }
It looks like the first (.)\2 would match ab, but the next (.)\1 would only match bb? I know I'm reading it incorrectly just begging for a plain explanation of whats going on there. Thanks!

Replies are listed 'Best First'.
Re: Regex from Learning Perl
by CountZero (Bishop) on Feb 17, 2012 at 20:17 UTC
    It matches:
    • a literal "y"
    • any single character not being a newline (save this in \1), in this case an 'a' is matched
    • any single character not being a newline (save this in \2), in this case a 'b' is matched
    • Match whatever is in \2, i.e. a 'b'
    • Match whatever is in \1, i.e. an 'a'
    The match succeeds on yabba already.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics
      Awesome. Thanks!
Re: Regex from Learning Perl
by toolic (Bishop) on Feb 17, 2012 at 20:21 UTC
    Basic debugging checklist, Tip #9: Demystify regular expressions by installing and using the CPAN module YAPE::Regex::Explain
    The regular expression: (?-imsx:y(.)(.)\2\1) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- y 'y' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- \2 what was matched by capture \2 ---------------------------------------------------------------------- \1 what was matched by capture \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Re: Regex from Learning Perl
by Anonymous Monk on Feb 17, 2012 at 20:21 UTC

    First (.) matches "a" and stores it in $1

    Second (.) matches "b" and stores it in $2

    \2 matches the value of $2 , it matches "b"

    \1 matches the value of $1 , it matches "a"

    use re 'debug';

    $ perl -Mre=debug -le " q/yabba dabba doo/ =~ /y(.)(.)\2\1/ " Compiling REx "y(.)(.)\2\1" Final program: 1: EXACT <y> (3) 3: OPEN1 (5) 5: REG_ANY (6) 6: CLOSE1 (8) 8: OPEN2 (10) 10: REG_ANY (11) 11: CLOSE2 (13) 13: REF2 (15) 15: REF1 (17) 17: END (0) anchored "y" at 0 (checking anchored) minlen 3 Guessing start of match in sv for REx "y(.)(.)\2\1" against "yabba dab +ba doo" Found anchored substr "y" at offset 0... Guessed: match at offset 0 Matching REx "y(.)(.)\2\1" against "yabba dabba doo" 0 <> <yabba dabb> | 1:EXACT <y>(3) 1 <y> <abba dabba> | 3:OPEN1(5) 1 <y> <abba dabba> | 5:REG_ANY(6) 2 <ya> <bba dabba > | 6:CLOSE1(8) 2 <ya> <bba dabba > | 8:OPEN2(10) 2 <ya> <bba dabba > | 10:REG_ANY(11) 3 <yab> <ba dabba d> | 11:CLOSE2(13) 3 <yab> <ba dabba d> | 13:REF2(15) 4 <yabb> <a dabba do> | 15:REF1(17) 5 <yabba> < dabba doo> | 17:END(0) Match successful! Freeing REx: "y(.)(.)\2\1"

    YAPE::Regex::Explain

    use YAPE::Regex::Explain; print YAPE::Regex::Explain->new( qr/y(.)(.)\2\1/ )->explain; __END__ The regular expression: (?-imsx:y(.)(.)\2\1) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- y 'y' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- ( group and capture to \2: ---------------------------------------------------------------------- . any character except \n ---------------------------------------------------------------------- ) end of \2 ---------------------------------------------------------------------- \2 what was matched by capture \2 ---------------------------------------------------------------------- \1 what was matched by capture \1 ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
      $ perl -Mre=debug -le " @F = q/yabba yobbo yebbe yubbu/ =~ /y(.)(.)\2\ +1/g " Compiling REx "y(.)(.)\2\1" Final program: 1: EXACT <y> (3) 3: OPEN1 (5) 5: REG_ANY (6) 6: CLOSE1 (8) 8: OPEN2 (10) 10: REG_ANY (11) 11: CLOSE2 (13) 13: REF2 (15) 15: REF1 (17) 17: END (0) anchored "y" at 0 (checking anchored) minlen 3 Guessing start of match in sv for REx "y(.)(.)\2\1" against "yabba yob +bo yebbe yubbu" Found anchored substr "y" at offset 0... Guessed: match at offset 0 Matching REx "y(.)(.)\2\1" against "yabba yobbo yebbe yubbu" 0 <> <yabba yobb> | 1:EXACT <y>(3) 1 <y> <abba yobbo> | 3:OPEN1(5) 1 <y> <abba yobbo> | 5:REG_ANY(6) 2 <ya> <bba yobbo > | 6:CLOSE1(8) 2 <ya> <bba yobbo > | 8:OPEN2(10) 2 <ya> <bba yobbo > | 10:REG_ANY(11) 3 <yab> <ba yobbo y> | 11:CLOSE2(13) 3 <yab> <ba yobbo y> | 13:REF2(15) 4 <yabb> <a yobbo ye> | 15:REF1(17) 5 <yabba> < yobbo yeb> | 17:END(0) Match successful! Guessing start of match in sv for REx "y(.)(.)\2\1" against " yobbo ye +bbe yubbu" Found anchored substr "y" at offset 1... Starting position does not contradict /^/m... Guessed: match at offset 1 Matching REx "y(.)(.)\2\1" against "yobbo yebbe yubbu" 6 <abba > <yobbo yebb> | 1:EXACT <y>(3) 7 <bba y> <obbo yebbe> | 3:OPEN1(5) 7 <bba y> <obbo yebbe> | 5:REG_ANY(6) 8 <ba yo> <bbo yebbe > | 6:CLOSE1(8) 8 <ba yo> <bbo yebbe > | 8:OPEN2(10) 8 <ba yo> <bbo yebbe > | 10:REG_ANY(11) 9 <a yob> <bo yebbe y> | 11:CLOSE2(13) 9 <a yob> <bo yebbe y> | 13:REF2(15) 10 < yobb> <o yebbe yu> | 15:REF1(17) 11 <yobbo> < yebbe yub> | 17:END(0) Match successful! Guessing start of match in sv for REx "y(.)(.)\2\1" against " yebbe yu +bbu" Found anchored substr "y" at offset 1... Starting position does not contradict /^/m... Guessed: match at offset 1 Matching REx "y(.)(.)\2\1" against "yebbe yubbu" 12 <obbo > <yebbe yubb> | 1:EXACT <y>(3) 13 <bbo y> <ebbe yubbu> | 3:OPEN1(5) 13 <bbo y> <ebbe yubbu> | 5:REG_ANY(6) 14 <bo ye> <bbe yubbu> | 6:CLOSE1(8) 14 <bo ye> <bbe yubbu> | 8:OPEN2(10) 14 <bo ye> <bbe yubbu> | 10:REG_ANY(11) 15 <o yeb> <be yubbu> | 11:CLOSE2(13) 15 <o yeb> <be yubbu> | 13:REF2(15) 16 < yebb> <e yubbu> | 15:REF1(17) 17 < yebbe> < yubbu> | 17:END(0) Match successful! Guessing start of match in sv for REx "y(.)(.)\2\1" against " yubbu" Found anchored substr "y" at offset 1... Starting position does not contradict /^/m... Guessed: match at offset 1 Matching REx "y(.)(.)\2\1" against "yubbu" 18 < yebbe > <yubbu> | 1:EXACT <y>(3) 19 < yebbe y> <ubbu> | 3:OPEN1(5) 19 < yebbe y> <ubbu> | 5:REG_ANY(6) 20 < yebbe yu> <bbu> | 6:CLOSE1(8) 20 < yebbe yu> <bbu> | 8:OPEN2(10) 20 < yebbe yu> <bbu> | 10:REG_ANY(11) 21 < yebbe yub> <bu> | 11:CLOSE2(13) 21 < yebbe yub> <bu> | 13:REF2(15) 22 < yebbe yubb> <u> | 15:REF1(17) 23 < yebbe yubbu> <> | 17:END(0) Match successful! Freeing REx: "y(.)(.)\2\1"

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://954611]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-04-24 20:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found