http://www.perlmonks.org?node_id=1043806

Clovis_Sangrail has asked for the wisdom of the Perl Monks concerning the following question:

The excellent Simon Cozens tutorial at

http://www.perl.com/pub/2003/06/06/regexps.html

shows how to use regex's to isolate parenthesised text within a string with handling of nested/multiple parentheses. Cozens constructs a recursive Regex, and uses the ( ??{ $regex_variable } ) construct to defer interpolation of the variable until execution. It works for me!:

$ cat p2.pl #!/opt/perl5.16/bin/perl use strict; use warnings; our $paren = qr/ # Need declared variable with use stri +ct. \( ( [^()]+ # Not parens | (??{ our $paren }) # Another balanced group (not interpol +ated yet) )* \) /x; # 'x' means ignore whitespace, comment +s. my $stuff = "On the outside now then (we go( in( and in(&stop)(awhile) + ( further ))) but still (here) ) and now (for a while) we are out ag +ain."; $stuff =~ /($paren)/; print "----------\n"; print "$stuff\n"; print "----------\n"; print $1 . "\n"; print "----------\n";
$ ./p2.pl ---------- On the outside now then (we go( in( and in (&stop)(awhile) ( further ) +)) but still (here) ) and now (for a while) we are out again. ---------- (we go( in( and in (&stop)(awhile) ( further ))) but still (here) ) ---------- $

But if I try to use the regex again it does not work as I think it ought. In the above code when I replace the matching regex with /($paren)^()*($paren)/ I expect $2 to be "(for a while)", but that does not work:

$ cat p3.pl #!/opt/perl5.16/bin/perl use strict; use warnings; our $paren = qr/ # Need declared variable with use stri +ct. \( ( [^()]+ # Not parens | (??{ our $paren }) # Another balanced group (not interpol +ated yet) )* \) /x; # 'x' means ignore whitespace, comment +s. my $stuff = "On the outside now then (we go( in( and in (&stop)(awhile +) ( further ))) but still (here) ) and now (for a while) we are out a +gain."; $stuff =~ /($paren)[^()]*($paren)/; print "----------\n"; print "$stuff\n"; print "----------\n"; print $1 . "\n"; print "----------\n"; print $2 . "\n"; print "----------\n"; $ ./p3.pl ---------- On the outside now then (we go( in( and in (&stop)(awhile) ( further ) +)) but still (here) ) and now (for a while) we are out again. ---------- (we go( in( and in (&stop)(awhile) ( further ))) but still (here) ) ---------- ---------- $

$2 is empty. Does anyone know what is going on here? If I take out the ^()+ portion of the regex, then the program hangs when I run it.

Replies are listed 'Best First'.
Re: Regex's, parentheses, and the mysterious ( ??{ } ) operator
by Clovis_Sangrail (Beadle) on Jul 11, 2013 at 21:13 UTC

    Fortunately it looks like this regex still sets $`, $&, and $' properly, so as a practical matter I can use successive $' to search for several parenthesized strings.

    $ cat p5.pl #!/opt/perl5.16/bin/perl use strict; use warnings; our $paren = qr/ # Need declared variable with use stri +ct. \( ( [^()]+ # Not parens | (??{ our $paren }) # Another balanced group (not interpol +ated yet) )* \) /x; # 'x' means ignore whitespace, comment +s. my $stuff = "On the outside now then (we go( in( and in (&stop)(awhile +) ( further ))) but still (here) ) and now (for a while) we are out a +gain."; $stuff =~ /($paren)/; print "Orig----------\n"; print "|" . $stuff . "|\n"; print '$1------------' . "\n"; print "|" . $1 . "|\n"; print 'Match---------' . "\n"; print "|" . $& . "|\n"; print 'B4------------' . "\n"; print "|" . $` . "|\n"; print 'After---------' . "\n"; print "|" . $' . "|\n"; print "----------\n"; $ ./p5.pl Orig---------- |On the outside now then (we go( in( and in (&stop)(awhile) ( further +))) but still (here) ) and now (for a while) we are out again.| $1------------ |(we go( in( and in (&stop)(awhile) ( further ))) but still (here) )| Match--------- |(we go( in( and in (&stop)(awhile) ( further ))) but still (here) )| B4------------ |On the outside now then | After--------- | and now (for a while) we are out again.| ---------- $
Re: Regex's, parentheses, and the mysterious ( ??{ } ) operator
by Laurent_R (Canon) on Jul 11, 2013 at 21:55 UTC

    I did not really understand what your problem is, and it is a bit late for me now, so I have to give up trying to get the point. Just one point that I noticed, though, and may possibly explain unexpected behavior: the $stuff variable is not the same in the two programs:

    my $stuff = "On the outside now then (we go( in( and in (&stop)(awhile +) ( further ))) but still (here) ) and now (for a while) we are out a +gain."; my $stuff = "On the outside now then (we go( in( and in(&stop)(awhile) + ( further ))) but still (here) ) and now (for a while) we are out ag +ain.";

    Or to show it more clearly (I hate the way the Perlmonks site dispays code snippet, why should it cut it at line 70 or 71, making any significant code almost unreadable, at a time where most people don't have any problem displaying 250 or 300 characters on their screen?), I"ll just quote where the difference is:

    in (&stop)(awhile) in(&stop)(awhile)

    A space missing. Small difference. No idea if this is the source of your problem, but I thought it might be useful to let you know.

      Or to show it more clearly (I hate the way the Perlmonks site dispays code snippet, all configurable and stuff, why didn't I configure my user configuration options

      There , fixed it for you :) Help for User Settings, User Settings

        Or, to put it in a somewhat more user-friendly manner, click on the "user settings" link in the comment to which I am replying, and you will be taken to a long list of configuration options, none of which have anything to do with line-wrap or code-wrap. However, at the top of that page are links for ten or so *more* pages of settings that you can mess with. Fortunately the very first of these, "Display Settings", has the needed stuff. Within the several tables of configuration options are selections to turn off code wrap altogether, or increase the length before wrap happens. I turned off wrapping, it works for me.

Re: Regex's, parentheses, and the mysterious ( ??{ } ) operator
by Anonymous Monk on Jul 12, 2013 at 14:23 UTC
    I think you have one too many  our $paren , should be  (??{ $paren })

      Laurent R: "the $stuff variable is not the same in the two programs:"

      I must've done that after I copied one program to the other, but it does not matter, in both programs that area is inside the first pair of parenthesis, and it gets included in the match.

      Anonymous Monk: "I think you have one too many $parens..."

      I do not know how to say what the problem is any differently. I *want* to use $paren a second time and have it find the *second* separate parenthesized text string within the $stuff variable, and set the $2 regex memory variable equal to it. The regex defined in the variable $paren is set works, it successfully finds the first open '(' and balancing close ')' pair of parenthesis. Because I use it as:

      /($paren)/

      It sets the matching text to the $1 variable. How come I cannot use it again, via something like:

      /($paren)[^()]+($paren)/

      The sample $stuff variable does have a second, separate parenthesized text string in it ( "(for a while)" ) how come the same $paren regex does not find that second parenthsized string and set $2 to it?

        Please try reading that again, because our is our
Re: Regex's, parentheses, and the mysterious ( ??{ } ) operator
by Anonymous Monk on Jul 12, 2013 at 14:25 UTC