Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Regexes are slow (or, why I advocate String::Index)

by halley (Prior)
on May 17, 2004 at 13:59 UTC ( [id://353966]=note: print w/replies, xml ) Need Help??


in reply to Regexes are slow (or, why I advocate String::Index)

My first reaction is, deciding to go with index() for anything that even remotely smells like natural language processing development is probably premature optimization. NLP code is organic code; organic programming features like patterns or templates will benefit the developer.

Sure, index() is faster than s///. But only for the things that index() can solve.

With much of natural language processing, you're probably going to try a LOT of alternative forms, and grammars, and minor adjustments until you get it right. Regexen may be slower to run, but they're faster to develop in any but the simplest of cases.

If you develop your code with regexen, and end up realizing that a few of your lines could "benefit" from a simple index() replacement, go ahead and replace it. I doubt that you'll replace 1% of your whole NLP code in a typical project, but you'll spend a lot of time hunting for it and verifying that the replacements didn't break anything. And if you later realize you need to tweak the NLP again, you might have to undo your little optimizations.

--
[ e d @ h a l l e y . c c ]

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://353966]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-24 04:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found