Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Maximal match in a recursive regex

by diotalevi (Canon)
on Jun 26, 2003 at 16:48 UTC ( #269311=perlquestion: print w/ replies, xml ) Need Help??
diotalevi has asked for the wisdom of the Perl Monks concerning the following question:

Given data like "a[b[c[d]]" the following regex matches the inner-most [d]. I'd like suggestions on how I can match the outermost [c[d]]. I feel like there's something simple I'm missing but ... well, I'm missing it.

my $re; $re = qr/ \[ # Opening bracket ( # Capture the contents [^][]+ # Body of a link | (??{$re}) # Or recurse ) \] # Closing bracket /x; $k = "a[b[c[d]]"; $k =~ s/$re/<$1>/g; print $k;

Comment on Maximal match in a recursive regex
Select or Download Code
Replies are listed 'Best First'.
Re: Maximal match in a recursive regex ([^][]+)
by tye (Cardinal) on Jun 26, 2003 at 18:26 UTC

    Just a quick note that I consider using [^][]+ in a regex to be obfuscation. (: I realize that backwhacks are a bit ugly, but I don't condone relying on the little-used trick that ] is not special when it is the first character (including after the optional "^") of a character class.

    I'd prefer [^\[\]]+, even though the eye doesn't have the easiest time lining up the brackets (it is ugly while your construct is pretty but misleading, like an optical illusion). :)

                    - tye

      Huh. And I wasn't even attempting to mentally match the internal brackets with the external ones. I just wrote it correctly so I wouldn't need backwhacks and that it coincidentally looks like two classes entirely escaped me. Thanks for altering me to that mental blindspot.

Re: Maximal match in a recursive regex
by diotalevi (Canon) on Jun 26, 2003 at 17:06 UTC

    Ah I see. The key is to change the capturing group to match one or more times instead of just once. It just becomes )+ from ).

    Added: I goofed. That is *part* of the key. The above change has a maximal match but loses the contents of the non-innermost matches. Here's a version that *works*

    $re = qr/ \[ # Opening bracket ((?: # Capture the contents [^][]+ # Body of a link | (??{$re}) # Or recurse )+) # and allow repeats internally \] # Closing bracket /x;

    Noted: for a brief period there was also some pushing into @f. That shouldn't have been posted so I removed it.

      This is how I fixed it.

      my $re; $re = qr/ \[ # Opening bracket ( # Capture the contents [^][]+ # Body of a link | (??{$re}) # Or recurse )+ # added per diotalevi's instructions \] # Closing bracket /x; $k = "a[b[c[d]]"; $k =~ s/($re)/<$1>/g; # I added the ()'s :) print $k;

      Bah! It captures the outer square brackets too :(

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://269311]
Approved by Thelonius
Front-paged by Thelonius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2015-07-29 01:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (260 votes), past polls