Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: More Power to your Regex

by Juerd (Abbot)
on Apr 02, 2002 at 00:28 UTC ( #155889=note: print w/ replies, xml ) Need Help??


in reply to More Power to your Regex

I really, really like this approach, and think it should be implemented in Perl 6, or if possible even sooner. This is a much cleaner solution than the clumsy variable evaluation.

Here's my try to match simple xml-like data:

% ^ \s* ( # <1> < \s* ([a-zA-Z:]+) # <2/> (?: \s*[a-zA-Z:]* \s* = \s* (?:'[^']*'|"[^"]*") )* \s* (/\s*)? # <3/> > (?:[^<>]* | (?1))* # Update: added * to (?:) (?(3)| <\s*/\s*\2\s*> ) ) # </1> \s* $ %x
This is not at all xml-compliant, but at least handles simple data.
<foo><bar></bar></foo> # Match <foo><bar></foo></bar> # No match <foo><bar/></foo> # No match (WRONG) <foo><bar></foo> # No match <foo bar=baz/> # No match <foo bar="baz"> # No match <foo bar="baz"/> # Match < fooo / > # Match <foo/>foo # No match foo<foo/> # No match <foo>foo</foo> # Match <foo><bar/>foo</foo> # No match (WRONG) <a><b><c></c></b></a> # No match (WRONG!!)
Could it be that backreferences (like \2) in this case or conditionals (like (?(3)) don't work the way they should, when your patch is used?.

UPDATE The new version of robin's patch does parse the deeper recursion correctly. See also: Re: Recursive Regex: Update.

U28geW91IGNhbiBhbGwgcm90MTMgY
W5kIHBhY2soKS4gQnV0IGRvIHlvdS
ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
geW91IHNlZSBpdD8gIC0tIEp1ZXJk


Comment on Re: More Power to your Regex
Select or Download Code
Re: Re: More Power to your Regex
by robin (Chaplain) on Apr 02, 2002 at 08:52 UTC
    Very impressive! ++ I'm going to rebuild PCRE in debugging mode, and see if I can work out what's going wrong here. There might well be a bug in my patch; I've certainly never tested it with conditionals.

      I've certainly never tested it with conditionals.

      I think it's the conditional indeed, because it works smoothly when I re-write it to not use a conditional:

      % ^ \s* ( # <1> # Single tags like <foo/> < \s* [a-zA-Z:]+ (?: \s*[a-zA-Z:]* \s* = \s* (?:'[^']*'|"[^"]*") )* \s* /\s* > | # Tags in pairs like <foo>content</foo> < \s* ([a-zA-Z:]+) # <2/> (?: \s*[a-zA-Z:]* \s* = \s* (?:'[^']*'|"[^"]*") )* \s* > (?:[^<>]* | (?1))* <\s*/\s*\2\s*> ) # </1> \s* $ %x
      <foo><bar></bar></foo> # Match <foo><bar></foo></bar> # No match <foo><bar/></foo> # Match <foo><bar></foo> # No match <foo bar=baz/> # No match <foo bar="baz"> # No match <foo bar="baz"/> # Match < fooo / > # Match <foo/>foo # No match foo<foo/> # No match <foo>foo</foo> # Match <foo><bar/>foo</foo> # Match #<a><b><c></c></b></a> # No match (WRONG!!)
      Now, there's still the three-level-deep problem...

      U28geW91IGNhbiBhbGwgcm90MTMgY
      W5kIHBhY2soKS4gQnV0IGRvIHlvdS
      ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
      geW91IHNlZSBpdD8gIC0tIEp1ZXJk
      

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://155889]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (9)
As of 2014-10-22 02:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (112 votes), past polls