in reply to Regexes are slow (or, why I advocate String::Index)

My first reaction is, deciding to go with index() for anything that even remotely smells like natural language processing development is probably premature optimization. NLP code is organic code; organic programming features like patterns or templates will benefit the developer.

Sure, index() is faster than s///. But only for the things that index() can solve.

With much of natural language processing, you're probably going to try a LOT of alternative forms, and grammars, and minor adjustments until you get it right. Regexen may be slower to run, but they're faster to develop in any but the simplest of cases.

If you develop your code with regexen, and end up realizing that a few of your lines could "benefit" from a simple index() replacement, go ahead and replace it. I doubt that you'll replace 1% of your whole NLP code in a typical project, but you'll spend a lot of time hunting for it and verifying that the replacements didn't break anything. And if you later realize you need to tweak the NLP again, you might have to undo your little optimizations.

[ e d @ h a l l e y . c c ]