go ahead... be a heretic | |
PerlMonks |
Re^4: removing stop wordsby fishbot_v2 (Chaplain) |
on May 29, 2005 at 15:34 UTC ( [id://461528]=note: print w/replies, xml ) | Need Help?? |
Yes - Jarkko Hietaniemi's Regex::PreSuf does just that.
If we assume you aren't incurring the cost of building the regex each time (possibly you keep a stopwords file and stopreg file and rebuild the latter from the former when the former changes, or simply stat and rebuild from the main program...) then you get a significant savings:
pre1 is my simple algorithm from upthread, reg is a straight alternation, and presuf is presuf(). I used the english stoplist from Lingua::EN::StopWords (about 200 words) and a 4000 word text.
In Section
Seekers of Perl Wisdom
|
|