I really, really like this approach, and think it should be implemented in Perl 6, or if possible even sooner. This is a much cleaner solution than the clumsy variable evaluation.
Here's my try to match simple xml-like data:
%
^
\s*
( # <1>
<
\s*
([a-zA-Z:]+) # <2/>
(?:
\s*[a-zA-Z:]*
\s* = \s*
(?:'[^']*'|"[^"]*")
)*
\s*
(/\s*)? # <3/>
>
(?:[^<>]* | (?1))* # Update: added * to (?:)
(?(3)|
<\s*/\s*\2\s*>
)
) # </1>
\s*
$
%x
This is not at all xml-compliant, but at least handles simple data.
<foo><bar></bar></foo> # Match
<foo><bar></foo></bar> # No match
<foo><bar/></foo> # No match (WRONG)
<foo><bar></foo> # No match
<foo bar=baz/> # No match
<foo bar="baz"> # No match
<foo bar="baz"/> # Match
< fooo / > # Match
<foo/>foo # No match
foo<foo/> # No match
<foo>foo</foo> # Match
<foo><bar/>foo</foo> # No match (WRONG)
<a><b><c></c></b></a> # No match (WRONG!!)
Could it be that backreferences (like \2) in this case or conditionals (like (?(3)) don't work the way they should, when your patch is used?.
UPDATE The new version of robin's patch does parse the deeper recursion correctly. See also: Re: Recursive Regex: Update.
U28geW91IGNhbiBhbGwgcm90MTMgY
W5kIHBhY2soKS4gQnV0IGRvIHlvdS
ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
geW91IHNlZSBpdD8gIC0tIEp1ZXJk
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|