good chemistry is complicated, and a little bit messy -LW |
|
PerlMonks |
Re: Splitting strings into words when there are no separatorsby Tanktalus (Canon) |
on Sep 14, 2005 at 14:32 UTC ( [id://491883]=note: print w/replies, xml ) | Need Help?? |
The most straight-forward approach, coming from a person who has never done this before, making it probably a very naive approach, is to take the whole string and pass it into a spell checker, such as aspell or ispell. If the spell check says "no", remove the last character. Repeat. Once you've found the longest possible word starting from the beginning, remove it, and repeat the entire process again until you've removed everything. Pitfalls: compound words, such as "downstairs" or even "into", are preferred over splitting the words; brute force method is probably as slow as it gets; figuring out where extra words go requires a knowledge of grammar which a spell checker can't do. Note that I've heard you can open a pipe to aspell so that you're not re-running it each time, so that would give you the speed of loading the dictionary into a hash and using C instead of perl to scan it, while reducing your development time and probably the runtime (since it may be able to skip loading parts of the dictionary file). And, with the right spell-checker, it may offer suggestions - although figuring out if it's a sensical suggestion is beyond me. Short version: my condolences on the job assignment. :-(
In Section
Seekers of Perl Wisdom
|
|