|Perl: the Markov chain saw|
Re: What's the best way to do a pattern search like this?by CharlesClarkson (Curate)
|on Jul 20, 2001 at 10:58 UTC||Need Help??|
Some things to ponder:
How should the algorithm handle hyphenated words? Should pre-paid become pre and paid or remain pre-paid? Will any words wrap to the next line using a hyphen?
Are there any slang or shortcut words in the file? How should b4 be handled?
Is the file short or long? Should the algorithm read the entire file into memory or would it be better to process each line?
How might you handle dates: 500 A.D., c. 1500 bc.
And what about other abreviations: Mr. Jr. Ave. etc. e.g.
Charles K. Clarkson