Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
My starting point was this perl example from the perlre manpage (maybe not a good starting point, and you're welcome to point that out a better example if there is one!), but:
The following pattern matches a function foo() which may contain balanced parentheses as the argument. $re = qr{ ( # paren group 1 (full function) foo ( # paren group 2 (parens) \( ( # paren group 3 (contents of parens) (?: (?> [^()]+ ) # Non-parens without backtracking | (?2) # Recurse to start of paren group 2 )* ) \) ) ) }x;
Now I wanted to go to a pattern that matched a '{' opening brace followed or preceded by a balanced number of parenthesis.

I also discovered that the example doesn't handle backslashes or comment characters and I want mine to do so.

Well first get rid of '{' as the quote char cuz it's what i want to match. And since I wasn't using variables, single quote seemed logical. So I ended up with something like:

#!/bin/perl -w use strict; #(?:[^{}#]|(?:\\).)* #(?: (?:\{2}+) (?:[^{}#]* | (?:\\.)) )* my $re = qr'^ ( { ( { ( (?: (?> [^{}]+ ) | (?2) )* ) } )*$ ) 'x; while (<>) { printf "%s\n", m{$re} ? "match" : "nomatch"; /^q/ && exit; }
The comments at the top were broken attempts to allow chars at the beginning... then I noticed that inside single quotes, backslash only quotes backslash or a single quote. Yikes! So I got rid of the backslashes in the pattern that were quoting literals.

The above seems to work with a a single opening brace followed by some number of matching braces with optional content. It ignores backslashes and comments as significant.

It also doesn't handle an even number of braces BEFORE the matching opening brace -- which is where I was looking next. To do so, I was going to emulate the recursive (?2) matching the 2nd capture expression... but realized as soon as I introduced more captures before my literal '{', that number 2 would change. Aeeii.. Named captures... what a cool idea... they won't change... seemed straight forward enough...so my re became:

my $re = qr'^ ( { (?<R> { ( (?: (?> [^{}]+ ) | (\q<R>) )* ) } )*$ ) 'x;
But it doesn't work. It worked for a simple match + single nesting, like:
'{' and '{{}', but '{{{}{}}' failed.
It works with the numbered version. So what went wrong? I had hoped, with that working to put a version (?<L>...) (left side v. Right side), before the literal brace, but if it doesn't work on one side...not too hopeful about 2 sides.

Any ideas on how to simplify the backslash processing and comment processing would be appreciated -- you can see in the 2nd comment.. I thought to use a possessive capture if two backslashes were next to each other -- followed by either NOT one of the forbidden chars, OR a backslash and 'any char' -- but that was a dismal failure and I thought that should have worked!.... (1st comment was a pitiful 1st attempt...eh!)...

And here I thought it was all cool, the example in the man page of useful code... and then I tried to use it... *smack*: reality hit.

So why'd my name conversion not work? and the 2nd comment re for handling BS and # seemed reasonable, no?? *sigh*...


In reply to how to go from numbered captures to named? by perl-diddler

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (4)
As of 2024-04-25 13:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found