Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: Split a sentence into words

by akho (Hermit)
on May 30, 2009 at 04:45 UTC ( #767010=note: print w/replies, xml ) Need Help??

in reply to Split a sentence into words

my @vocabulary = qw(abd abcd abc a bc); my $sentence = 'abdaabc'; my $pattern = join '|', @vocabulary; my @words = $sentence =~ /($pattern)/g;

note that @vocabulary has to be sorted in such a way that "longer" words come earlier; i.e. if word x is a prefix of word y, word y must come earlier in the list.

Upd Does not actually work; i.e. it works only for some vocabularies; say (abcd, abc, de) will not split 'abcde' right. Things get complicated and computer-sciencey. See bart and ikegami's replies below.

Replies are listed 'Best First'.
Re^2: Split a sentence into words
by bart (Canon) on May 30, 2009 at 12:27 UTC
    note that @vocabulary has to be sorted in such a way that "longer" words come earlier
    If you depend on a module like Regex::PreSuf, not only will it have the same effect, i.e. matching the longest match possible, but likely, it'll match faster, at least for longer lists, and pre 5.10 perl.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://767010]
[LanX]: interesting the author doesn't seem to accept javascript as a scripting language
[LanX]: ... and calls php7 the winner oO
[RonW]: RPerl is only a curiosity to me. I can see where some one who primarily codes in Perl might find RPerl useful, but to me, given the choice between RPerl's restrictions and C, I'd choose C
[LanX]: Rperl had better chances as alternative for inline::cpp

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2017-05-22 20:40 GMT
Find Nodes?
    Voting Booth?