Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: More Power to your Regex

by Juerd (Abbot)
on Apr 02, 2002 at 00:28 UTC ( [id://155889]=note: print w/replies, xml ) Need Help??


in reply to More Power to your Regex

I really, really like this approach, and think it should be implemented in Perl 6, or if possible even sooner. This is a much cleaner solution than the clumsy variable evaluation.

Here's my try to match simple xml-like data:

% ^ \s* ( # <1> < \s* ([a-zA-Z:]+) # <2/> (?: \s*[a-zA-Z:]* \s* = \s* (?:'[^']*'|"[^"]*") )* \s* (/\s*)? # <3/> > (?:[^<>]* | (?1))* # Update: added * to (?:) (?(3)| <\s*/\s*\2\s*> ) ) # </1> \s* $ %x
This is not at all xml-compliant, but at least handles simple data.
<foo><bar></bar></foo> # Match <foo><bar></foo></bar> # No match <foo><bar/></foo> # No match (WRONG) <foo><bar></foo> # No match <foo bar=baz/> # No match <foo bar="baz"> # No match <foo bar="baz"/> # Match < fooo / > # Match <foo/>foo # No match foo<foo/> # No match <foo>foo</foo> # Match <foo><bar/>foo</foo> # No match (WRONG) <a><b><c></c></b></a> # No match (WRONG!!)
Could it be that backreferences (like \2) in this case or conditionals (like (?(3)) don't work the way they should, when your patch is used?.

UPDATE The new version of robin's patch does parse the deeper recursion correctly. See also: Re: Recursive Regex: Update.

U28geW91IGNhbiBhbGwgcm90MTMgY
W5kIHBhY2soKS4gQnV0IGRvIHlvdS
ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
geW91IHNlZSBpdD8gIC0tIEp1ZXJk

Replies are listed 'Best First'.
Re: Re: More Power to your Regex
by robin (Chaplain) on Apr 02, 2002 at 08:52 UTC
    Very impressive! ++ I'm going to rebuild PCRE in debugging mode, and see if I can work out what's going wrong here. There might well be a bug in my patch; I've certainly never tested it with conditionals.

      I've certainly never tested it with conditionals.

      I think it's the conditional indeed, because it works smoothly when I re-write it to not use a conditional:

      % ^ \s* ( # <1> # Single tags like <foo/> < \s* [a-zA-Z:]+ (?: \s*[a-zA-Z:]* \s* = \s* (?:'[^']*'|"[^"]*") )* \s* /\s* > | # Tags in pairs like <foo>content</foo> < \s* ([a-zA-Z:]+) # <2/> (?: \s*[a-zA-Z:]* \s* = \s* (?:'[^']*'|"[^"]*") )* \s* > (?:[^<>]* | (?1))* <\s*/\s*\2\s*> ) # </1> \s* $ %x
      <foo><bar></bar></foo> # Match <foo><bar></foo></bar> # No match <foo><bar/></foo> # Match <foo><bar></foo> # No match <foo bar=baz/> # No match <foo bar="baz"> # No match <foo bar="baz"/> # Match < fooo / > # Match <foo/>foo # No match foo<foo/> # No match <foo>foo</foo> # Match <foo><bar/>foo</foo> # Match #<a><b><c></c></b></a> # No match (WRONG!!)
      Now, there's still the three-level-deep problem...

      U28geW91IGNhbiBhbGwgcm90MTMgY
      W5kIHBhY2soKS4gQnV0IGRvIHlvdS
      ByZWNvZ25pc2UgQmFzZTY0IHdoZW4
      geW91IHNlZSBpdD8gIC0tIEp1ZXJk
      

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://155889]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2024-03-28 12:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found