I really, really like this approach, and think it should be implemented in Perl 6, or if possible even sooner. This is a much cleaner solution than the clumsy variable evaluation.
Here's my try to match simple xml-like data:
%
^
\s*
( # <1>
<
\s*
([a-zA-Z:]+) # <2/>
(?:
\s*[a-zA-Z:]*
\s* = \s*
(?:'[^']*'|"[^"]*")
)*
\s*
(/\s*)? # <3/>
>
(?:[^<>]* | (?1))* # Update: added * to (?:)
(?(3)|
<\s*/\s*\2\s*>
)
) # </1>
\s*
$
%x
This is not at all xml-compliant, but at least handles simple data.
<foo><bar></bar></foo> # Match
<foo><bar></foo></bar> # No match
<foo><bar/></foo> # No match (WRONG)
<foo><bar></foo> # No match
<foo bar=baz/> # No match
<foo bar="baz"> # No match
<foo bar="baz"/> # Match
< fooo / > # Match
<foo/>foo # No match
foo<foo/> # No match
<foo>foo</foo> # Match
<foo><bar/>foo</foo> # No match (WRONG)
<a><b><c></c></b></a> # No match (WRONG!!)
Could it be that backreferences (like \2) in this case or conditionals (like (?(3)) don't work the way they should, when your patch is used?.
UPDATE The
new version of
robin's patch does parse the deeper recursion correctly. See also:
Re: Recursive Regex: Update.
U28geW91IGNhbiBhbGwgcm90MTMgY
W5kIHBhY2soKS4gQnV0IGRvIHlvdS
ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
geW91IHNlZSBpdD8gIC0tIEp1ZXJk