Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Splitting strings into words when there are no separators

by Tanktalus (Canon)
on Sep 14, 2005 at 14:32 UTC ( [id://491883]=note: print w/replies, xml ) Need Help??


in reply to Splitting strings into words when there are no separators

The most straight-forward approach, coming from a person who has never done this before, making it probably a very naive approach, is to take the whole string and pass it into a spell checker, such as aspell or ispell. If the spell check says "no", remove the last character. Repeat. Once you've found the longest possible word starting from the beginning, remove it, and repeat the entire process again until you've removed everything.

Pitfalls: compound words, such as "downstairs" or even "into", are preferred over splitting the words; brute force method is probably as slow as it gets; figuring out where extra words go requires a knowledge of grammar which a spell checker can't do.

Note that I've heard you can open a pipe to aspell so that you're not re-running it each time, so that would give you the speed of loading the dictionary into a hash and using C instead of perl to scan it, while reducing your development time and probably the runtime (since it may be able to skip loading parts of the dictionary file). And, with the right spell-checker, it may offer suggestions - although figuring out if it's a sensical suggestion is beyond me.

Short version: my condolences on the job assignment. :-(

  • Comment on Re: Splitting strings into words when there are no separators

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://491883]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (9)
As of 2024-04-24 10:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found