Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^5: Regex's, parentheses, and the mysterious ( ??{ } ) operator

by Clovis_Sangrail (Beadle)
on Jul 12, 2013 at 16:31 UTC ( #1044017=note: print w/ replies, xml ) Need Help??


in reply to Re^4: Regex's, parentheses, and the mysterious ( ??{ } ) operator
in thread Regex's, parentheses, and the mysterious ( ??{ } ) operator

If I delete the "use strict;" the program compiles/runs, but still does not match the 2nd regex. I also tried making a duplicate of $paren called $par2, and used it instead of $paren to set $2, and that doesn't work either, $2 is still blank.


Comment on Re^5: Regex's, parentheses, and the mysterious ( ??{ } ) operator
Re^6: Regex's, parentheses, and the mysterious ( ??{ } ) operator
by AnomalousMonk (Abbot) on Jul 12, 2013 at 18:42 UTC
    ... $2 is still blank.

    You have to properly account for all capturing groups in the overall regex, properly counting capture groups in any interpolated regex. If you have a capturing group within the interpolated recursive regex (not, IMHO, necessary), then you want to access the 3rd capture group variable. However, a capture group in the recursive regex screws up the more general
        @p = $s =~ m{ $r4 }xmsg;
    extraction regex. See examples below.

    >perl -wMstrict -le "my $s = 'x(y) (a(b)) ()() q (a(b)c()(d(e(f)g))h) q'; ;; our $r3 = qr{ \( (?: [^()]+ | (??{ our $r3 }) )* \) }xms; ;; my @p = $s =~ m{ $r3 }xmsg; print qq{'$_'} for @p; print '--------'; ;; $s =~ m{ ($r3) [^()]* ($r3) }xms; print qq{1 '$1' 2 '$2'}; print '--------'; ;; our $r4 = qr{ \( ( [^()]* | (??{ our $r4 }) )* \) }xms; ;; @p = $s =~ m{ $r4 }xmsg; print qq{'$_'} for @p; print '--------'; ;; $s =~ m{ ($r4) [^()]* ($r4) }xms; print qq{1 '$1' 3 '$3'}; print '--------'; " '(y)' '(a(b))' '()' '()' '(a(b)c()(d(e(f)g))h)' -------- 1 '(y)' 2 '(a(b))' -------- '' '' '' '' '' -------- 1 '(y)' 3 '(a(b))' --------

    Have you tried the very neat  "(?PARNO)" operator available with Perl 5.10+ and discussed in the example referred to here?

      "You have to properly account for all capturing groups in the overall regex..."

      I did not yet work through your example program or explore the references you gave me, but your sentence above jolted my brain into some semblance of activity. Yes, it makes a lot of sense that every time the recursive regex satisfies another capture group, it uses yet another capture variable. So I printed out a bunch of them:

      $ cat p7.pl #!/opt/perl5.16/bin/perl use strict; use warnings; our $paren = qr/ # Need declared variable with use stri +ct. \( ( [^()]+ # Not parens | (??{our $paren}) # Another balanced group (not interpol +ated yet) )* \) /x; # 'x' means ignore whitespace, comment +s. my $stuff = "On the outside now then (we go( in( and in (&stop)(awhile +) ( further ))) but still (here) ) and now ((for a while)) we are out + again."; $stuff =~ /($paren)[^()]*($paren)/; print "-original-\n"; print "$stuff\n"; print "1---------\n"; print 'X' . $1 . 'X' . "\n"; print "2---------\n"; print 'X' . $2 . 'X' . "\n"; print "3---------\n"; print 'X' . $3 . 'X' . "\n"; print "4---------\n"; print 'X' . $4 . 'X' . "\n"; print "5---------\n"; print 'X' . $5 . 'X' . "\n"; print "6---------\n"; print 'X' . $6 . 'X' . "\n"; print "----------\n"; $ ./p7.pl -original- On the outside now then (we go( in( and in (&stop)(awhile) ( further ) +)) but still (here) ) and now ((for a while)) we are out again. 1--------- X(we go( in( and in (&stop)(awhile) ( further ))) but still (here) )X 2--------- X X 3--------- X((for a while))X 4--------- X(for a while)X 5--------- Use of uninitialized value $5 in concatenation (.) or string at ./p7.p +l line 31. XX 6--------- Use of uninitialized value $6 in concatenation (.) or string at ./p7.p +l line 33. XX ---------- $

      So the recursive regex evaluation set $1 through $4! Thanks, this makes more sense now.

        ... every time the recursive regex satisfies another capture group, it uses yet another capture variable.

        Old-style, numbered capture groups are counted according to the literal order in which their opening parentheses appear in the final, compiled regex; they are not created at run-time and do not depend on the evaluation, recursive or otherwise, of any regex sub-expression at run-time. (Of course, the actual capturing happens at run-time!) In the example below, the final regex  $ry   (after full interpolation) is printed and the capture groups marked and counted (at least, that's what I tried to do!). Sorry for any wrap-around, which may make this difficult to read. Look for examples of capture group counting in perlre and perlretut.

        >perl -wMstrict -le "my $s = 'x(y) (a(b)) ()() q (a(b)c()(d(e(f)g))h) q'; ;; our $rx = qr{ \( ([^()]* | (??{ our $rx }))* \) }xms; ;; my $ry = qr{ ($rx) [^()]* ($rx) }xms; ;; $s =~ $ry; print qq{1st '$1' 3rd '$3'}; ;; print qq{\nfinal regex:}; print $ry; " 1st '(y)' 3rd '(a(b))' final regex: (?^msx: ((?^msx: \( ([^()]* | (??{ our $rx }))* \) )) [^()]* ((?^msx: +\( ([^()]* | (??{ our $rx }))* \) )) )
        counted capture groups: (?^msx: ((?^msx: \( ([^()]* | (??{ our $rx }))* \) )) [^()]* ((?^msx: +\( ([^()]* | (??{ our $rx }))* \) )) ) | | | | | + | | | 1st begin 2nd begin 2-end 1st end 3rd begin + 4th begin 4-end 3rd end

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1044017]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (10)
As of 2014-11-20 23:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (103 votes), past polls