Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

how to go from numbered captures to named?

by perl-diddler (Hermit)
on Jul 13, 2012 at 02:38 UTC ( #981538=perlquestion: print w/replies, xml ) Need Help??
perl-diddler has asked for the wisdom of the Perl Monks concerning the following question:

My starting point was this perl example from the perlre manpage (maybe not a good starting point, and you're welcome to point that out a better example if there is one!), but:
The following pattern matches a function foo() which may contain balanced parentheses as the argument. $re = qr{ ( # paren group 1 (full function) foo ( # paren group 2 (parens) \( ( # paren group 3 (contents of parens) (?: (?> [^()]+ ) # Non-parens without backtracking | (?2) # Recurse to start of paren group 2 )* ) \) ) ) }x;
Now I wanted to go to a pattern that matched a '{' opening brace followed or preceded by a balanced number of parenthesis.

I also discovered that the example doesn't handle backslashes or comment characters and I want mine to do so.

Well first get rid of '{' as the quote char cuz it's what i want to match. And since I wasn't using variables, single quote seemed logical. So I ended up with something like:

#!/bin/perl -w use strict; #(?:[^{}#]|(?:\\).)* #(?: (?:\{2}+) (?:[^{}#]* | (?:\\.)) )* my $re = qr'^ ( { ( { ( (?: (?> [^{}]+ ) | (?2) )* ) } )*$ ) 'x; while (<>) { printf "%s\n", m{$re} ? "match" : "nomatch"; /^q/ && exit; }
The comments at the top were broken attempts to allow chars at the beginning... then I noticed that inside single quotes, backslash only quotes backslash or a single quote. Yikes! So I got rid of the backslashes in the pattern that were quoting literals.

The above seems to work with a a single opening brace followed by some number of matching braces with optional content. It ignores backslashes and comments as significant.

It also doesn't handle an even number of braces BEFORE the matching opening brace -- which is where I was looking next. To do so, I was going to emulate the recursive (?2) matching the 2nd capture expression... but realized as soon as I introduced more captures before my literal '{', that number 2 would change. Aeeii.. Named captures... what a cool idea... they won't change... seemed straight forward my re became:

my $re = qr'^ ( { (?<R> { ( (?: (?> [^{}]+ ) | (\q<R>) )* ) } )*$ ) 'x;
But it doesn't work. It worked for a simple match + single nesting, like:
'{' and '{{}', but '{{{}{}}' failed.
It works with the numbered version. So what went wrong? I had hoped, with that working to put a version (?<L>...) (left side v. Right side), before the literal brace, but if it doesn't work on one side...not too hopeful about 2 sides.

Any ideas on how to simplify the backslash processing and comment processing would be appreciated -- you can see in the 2nd comment.. I thought to use a possessive capture if two backslashes were next to each other -- followed by either NOT one of the forbidden chars, OR a backslash and 'any char' -- but that was a dismal failure and I thought that should have worked!.... (1st comment was a pitiful 1st!)...

And here I thought it was all cool, the example in the man page of useful code... and then I tried to use it... *smack*: reality hit.

So why'd my name conversion not work? and the 2nd comment re for handling BS and # seemed reasonable, no?? *sigh*...

Replies are listed 'Best First'.
Re: how to go from numbered captures to named?
by davido (Archbishop) on Jul 13, 2012 at 08:01 UTC

    (\q<R>): Is that a typo in your post, or are you using \q<R> when you mean \g<R>? ...or do I just not know about the \q<NAME> backreferencing metacharacter? (I'm never confident that I know all of the possible metacharacters for regular expressions. ;)

    Isn't it a problem that you're backreferencing a capture group while still inside of it? I don't know for sure, but it seems funky. When I try a minimal example:


    Perl fails to compile the regex.

    I also get a failure to compile when I try your last example (after fixing the \q<name> vs \g<name> issue). I really think that trying to access a backreference while in the capture group that created it is asking for trouble.


      But you can do it from numbered capture groups? In fact they talk about referring to an earlier capture group to invoke recursion. As for the \q<name> -- that was a typo... It's supposed to be \k<name> (or \k'name'), also for the sake of python/pcre, (?P=name) is also accepted. ARG...I see the prob: It is an error to refer to a name not defined by a "(?<NAME>)" earlier in the pattern. the (2) refer's to the 2nd capture group from within the capture group -- for the name that's not ok, the name has to already be defined (hit the end paren)... Dang!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://981538]
Front-paged by NetWallah
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (8)
As of 2018-04-26 08:46 GMT
Find Nodes?
    Voting Booth?