Pathologically Eclectic Rubbish Lister | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
I'm not very comfortable with recursive patterns, they're new in Perl 5.10 and I doubt that PHP/PCRE support them... In that case, I'd take a 2-step approach, very much like the traditional lex/yacc approach, but simplified:
1. TokenizeUsing regular expressions, you can pull out the tokens: quoted strings, words, parens/braces/brackets, other symbols. That way you will not accidently mistake braces in quoted strings for syntactically meaningful braces. Your regex engine needs to be capable of continue matching where you left off last time, in Perl you use //g in scalar context, in Javascript you can use //g.exec(string). Likely PCRE supports something like it in PHP, but I don't actually know.The regex can look something like this (from the top of my head, not thoroughly tested): Note that I skip whitespace except newlines, which are meaningful in Javascript, as they can terminate the current stamement. Maybe (likely) you just don't care. Here's some (Perl) code to test it with — load the Javascript into $_ first: I only display newlines differently because a bare newline as a token doesn't print so clearly. 2. Parsing – balancing bracesAs you got through the tokens you extract one by one, you keep track of the nesting level: increment it if you encounter a bare "{", decrement it for a bare "}". As soon as it is decremented back to the same level as you started on for this function (usually 0, but it could be higher for nested functions), you found its end.Here's the same code again, extended to keep track of the nesting level. As I assume the Javascript is syntactically valid, I just keep a common $level for every type of bracket, it's just simpler this way.
This should suffice to get you started. update
In reply to Re: Recursive Regular Expression Help
by bart
|
|