Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

the best way to separate a string into words

by Anonymous Monk
on Jan 19, 2009 at 19:14 UTC ( #737371=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, what would be the best(easier to read the code) way to separate a string "word word word word" into an array of words? (Words are separated by space characters)

Comment on the best way to separate a string into words
Re: the best way to separate a string into words
by jeffa (Chancellor) on Jan 19, 2009 at 19:16 UTC

      I'd go with the \W+ ... but I guess it all depends on how you define a word.

      -derby

        I might too, but the OP defined words as "separated by space characters"

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
      It'd probably be best to split using the magical ' '.
      my $string = " a b c "; # note leading/trailing whitespaces my @array1 = split /\s+/, $string; # returns '', 'a', 'b', 'c' (4 item +s) my @array2 = split ' ', $string; # returns 'a', 'b', 'c' (3 item +s)
Re: the best way to separate a string into words
by borisz (Canon) on Jan 19, 2009 at 19:19 UTC
    split is the most easy way.
    Boris
Re: the best way to separate a string into words
by GrandFather (Sage) on Jan 19, 2009 at 19:46 UTC

    If you are natural language processing you will find the Lingua modules very useful.


    Perl's payment curve coincides with its learning curve.
Re: the best way to separate a string into words
by setebos (Beadle) on Jan 19, 2009 at 20:02 UTC
Reaped: Re: the best way to separate a string into words
by NodeReaper (Curate) on Jan 19, 2009 at 20:02 UTC
Re: the best way to separate a string into words
by Tanktalus (Canon) on Jan 20, 2009 at 00:13 UTC

    Generally, I use Text::ParseWords. But that's because I like the option for the user to pass in word "some phrase" word to get only three "words" out of it, basically allowing a form of escaping on spaces that makes sense. This isn't a whole lot more complicated than using split ' ', $string, but offers huge amounts of extra flexibility. Whether you do this or not is dependant on whether you want that flexibility or not.

Re: the best way to separate a string into words (contractions)
by tye (Cardinal) on Jan 20, 2009 at 02:11 UTC

    For one definition of "split" and "words", I'd use:

    my @words= $string =~ /(\w+(?:'\w+)*)/g;

    Which would give you words like qw( split and words I'd use ) not like qw( "split" and "words", ) nor like qw( words I d use ).

    Update: Or even, allow hyphenated-word capturing:

    my @words= $string =~ /(\w+(?:[-']\w+)*)/g;

    - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://737371]
Approved by Bloodnok
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (13)
As of 2015-07-03 11:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (51 votes), past polls