Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: Split a sentence into words

by akho (Hermit)
on May 30, 2009 at 04:45 UTC ( [id://767010]=note: print w/replies, xml ) Need Help??


in reply to Split a sentence into words

my @vocabulary = qw(abd abcd abc a bc); my $sentence = 'abdaabc'; my $pattern = join '|', @vocabulary; my @words = $sentence =~ /($pattern)/g;

note that @vocabulary has to be sorted in such a way that "longer" words come earlier; i.e. if word x is a prefix of word y, word y must come earlier in the list.

Upd Does not actually work; i.e. it works only for some vocabularies; say (abcd, abc, de) will not split 'abcde' right. Things get complicated and computer-sciencey. See bart and ikegami's replies below.

Replies are listed 'Best First'.
Re^2: Split a sentence into words
by bart (Canon) on May 30, 2009 at 12:27 UTC
    note that @vocabulary has to be sorted in such a way that "longer" words come earlier
    If you depend on a module like Regex::PreSuf, not only will it have the same effect, i.e. matching the longest match possible, but likely, it'll match faster, at least for longer lists, and pre 5.10 perl.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://767010]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (2)
As of 2026-01-17 12:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your view on AI coding assistants?





    Results (121 votes). Check out past polls.

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.