Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Split a sentence into words

by akho (Hermit)
on May 30, 2009 at 04:45 UTC ( #767010=note: print w/ replies, xml ) Need Help??


in reply to Split a sentence into words

my @vocabulary = qw(abd abcd abc a bc); my $sentence = 'abdaabc'; my $pattern = join '|', @vocabulary; my @words = $sentence =~ /($pattern)/g;

note that @vocabulary has to be sorted in such a way that "longer" words come earlier; i.e. if word x is a prefix of word y, word y must come earlier in the list.

Upd Does not actually work; i.e. it works only for some vocabularies; say (abcd, abc, de) will not split 'abcde' right. Things get complicated and computer-sciencey. See bart and ikegami's replies below.


Comment on Re: Split a sentence into words
Select or Download Code
Re^2: Split a sentence into words
by bart (Canon) on May 30, 2009 at 12:27 UTC
    note that @vocabulary has to be sorted in such a way that "longer" words come earlier
    If you depend on a module like Regex::PreSuf, not only will it have the same effect, i.e. matching the longest match possible, but likely, it'll match faster, at least for longer lists, and pre 5.10 perl.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://767010]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (9)
As of 2014-09-17 22:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (100 votes), past polls