|Perl: the Markov chain saw|
Ovid - Microsoft does the regex dance with two left feetby Ovid (Cardinal)
|on Mar 25, 2001 at 03:03 UTC||Need Help??|
Regular expressions (based on perlre) are available now for both VBScript and JScript.I can't comment about Jscript because I haven't used it, but I have found serious issues with VBScript's regex engine. A regex as simple as "(\d+)(?:\.log)" has repeatedly failed for me because the lookahead keeps generating an "unknown quantifier" error with the lookahead's question mark. Since we're using VBScript 5 in my company, I should be able to use lookaheads.
Plus, no one at my shop has been able to directly access the data captured by parentheses (i.e. with $1, $2, etc.). We've read through the documentation, but to no avail. I have to set the Regexp object to match globally and then loop through the "matches" property in the Regexp object to get to them. This has been consistent on all of our machines. Either this is poor documentation on Microsoft's part, or their regex engine is fundamentally broken.
The result of this is that I have avoided regular expressions in VBScript and gone back to old ways of extracting data. Not fun!
These are clearly bugs, but I'm not expecting MS to figure out what to do anytime soon. Consider the following quote from them where they try to explain what regexes are good for:
For example, if you need to search an entire web site to remove some outdated material and replace some HTML formatting tags, you can use a regular expression to test each file to see if the material or the HTML formatting tags you are looking for exists in that file.As I well know due to my own painful forays into this area, regexes should not be used to parse HTML. It's one of the more common newbie errors with regexes (along with "how do I match an email address"?). Microsoft has - how astonishing! - taken a great idea, broken it, and then given bad advice on what to do with it.
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.