Some things to ponder:
How should the algorithm handle hyphenated words? Should pre-paid become pre and paid or remain pre-paid? Will any words wrap to the next line using a hyphen?
Are there any slang or shortcut words in the file? How should b4 be handled?
Is the file short or long? Should the algorithm read the entire file into memory or would it be better to process each line?
How might you handle dates: 500 A.D., c. 1500 bc.
And what about other abreviations: Mr. Jr. Ave. etc. e.g.
Charles K. Clarkson
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
Outside of code tags, you may need to use entities for some characters:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||