Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^4: Regex's, parentheses, and the mysterious ( ??{ } ) operator

by Clovis_Sangrail (Beadle)
on Jul 12, 2013 at 16:21 UTC ( #1044015=note: print w/ replies, xml ) Need Help??


in reply to Re^3: Regex's, parentheses, and the mysterious ( ??{ } ) operator
in thread Regex's, parentheses, and the mysterious ( ??{ } ) operator

Ahh! I have deciphered the meaning of your koan. I get a compilation error when I take the "our" out of the middle of the recursive regex:

$ vi p6.pl "p6.pl" 36 lines, 925 characters #!/opt/perl5.16/bin/perl use strict; use warnings; our $paren = qr/ # Need declared variable with use stri +ct. \( ( [^()]+ # Not parens | (??{ $paren }) # Another balanced group (not interpol +ated yet) )* \) /x; # 'x' means ignore whitespace, comment +s. my $stuff = "On the outside now then (we go( in( and in (&stop)(awhile +) ( furthe r ))) but still (here) ) and now (for a while) we are out again."; $stuff =~ /($paren)[^()]*($paren)/; print "----------\n"; print "$stuff\n"; print "----------\n"; "p6.pl" 27 lines, 650 characters $ ./p6.pl Variable "$paren" is not imported at (re_eval 1) line 2. Global symbol "$paren" requires explicit package name at (re_eval 1) l +ine 2. Compilation failed in regexp at ./p6.pl line 14. $


Comment on Re^4: Regex's, parentheses, and the mysterious ( ??{ } ) operator
Download Code
Replies are listed 'Best First'.
Re^5: Regex's, parentheses, and the mysterious ( ??{ } ) operator
by Clovis_Sangrail (Beadle) on Jul 12, 2013 at 16:31 UTC

    If I delete the "use strict;" the program compiles/runs, but still does not match the 2nd regex. I also tried making a duplicate of $paren called $par2, and used it instead of $paren to set $2, and that doesn't work either, $2 is still blank.

      ... $2 is still blank.

      You have to properly account for all capturing groups in the overall regex, properly counting capture groups in any interpolated regex. If you have a capturing group within the interpolated recursive regex (not, IMHO, necessary), then you want to access the 3rd capture group variable. However, a capture group in the recursive regex screws up the more general
          @p = $s =~ m{ $r4 }xmsg;
      extraction regex. See examples below.

      >perl -wMstrict -le "my $s = 'x(y) (a(b)) ()() q (a(b)c()(d(e(f)g))h) q'; ;; our $r3 = qr{ \( (?: [^()]+ | (??{ our $r3 }) )* \) }xms; ;; my @p = $s =~ m{ $r3 }xmsg; print qq{'$_'} for @p; print '--------'; ;; $s =~ m{ ($r3) [^()]* ($r3) }xms; print qq{1 '$1' 2 '$2'}; print '--------'; ;; our $r4 = qr{ \( ( [^()]* | (??{ our $r4 }) )* \) }xms; ;; @p = $s =~ m{ $r4 }xmsg; print qq{'$_'} for @p; print '--------'; ;; $s =~ m{ ($r4) [^()]* ($r4) }xms; print qq{1 '$1' 3 '$3'}; print '--------'; " '(y)' '(a(b))' '()' '()' '(a(b)c()(d(e(f)g))h)' -------- 1 '(y)' 2 '(a(b))' -------- '' '' '' '' '' -------- 1 '(y)' 3 '(a(b))' --------

      Have you tried the very neat  "(?PARNO)" operator available with Perl 5.10+ and discussed in the example referred to here?

        "You have to properly account for all capturing groups in the overall regex..."

        I did not yet work through your example program or explore the references you gave me, but your sentence above jolted my brain into some semblance of activity. Yes, it makes a lot of sense that every time the recursive regex satisfies another capture group, it uses yet another capture variable. So I printed out a bunch of them:

        $ cat p7.pl #!/opt/perl5.16/bin/perl use strict; use warnings; our $paren = qr/ # Need declared variable with use stri +ct. \( ( [^()]+ # Not parens | (??{our $paren}) # Another balanced group (not interpol +ated yet) )* \) /x; # 'x' means ignore whitespace, comment +s. my $stuff = "On the outside now then (we go( in( and in (&stop)(awhile +) ( further ))) but still (here) ) and now ((for a while)) we are out + again."; $stuff =~ /($paren)[^()]*($paren)/; print "-original-\n"; print "$stuff\n"; print "1---------\n"; print 'X' . $1 . 'X' . "\n"; print "2---------\n"; print 'X' . $2 . 'X' . "\n"; print "3---------\n"; print 'X' . $3 . 'X' . "\n"; print "4---------\n"; print 'X' . $4 . 'X' . "\n"; print "5---------\n"; print 'X' . $5 . 'X' . "\n"; print "6---------\n"; print 'X' . $6 . 'X' . "\n"; print "----------\n"; $ ./p7.pl -original- On the outside now then (we go( in( and in (&stop)(awhile) ( further ) +)) but still (here) ) and now ((for a while)) we are out again. 1--------- X(we go( in( and in (&stop)(awhile) ( further ))) but still (here) )X 2--------- X X 3--------- X((for a while))X 4--------- X(for a while)X 5--------- Use of uninitialized value $5 in concatenation (.) or string at ./p7.p +l line 31. XX 6--------- Use of uninitialized value $6 in concatenation (.) or string at ./p7.p +l line 33. XX ---------- $

        So the recursive regex evaluation set $1 through $4! Thanks, this makes more sense now.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1044015]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (18)
As of 2015-07-30 16:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (273 votes), past polls