Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

the best way to separate a string into words

by Anonymous Monk
on Jan 19, 2009 at 19:14 UTC ( #737371=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi, what would be the best(easier to read the code) way to separate a string "word word word word" into an array of words? (Words are separated by space characters)

Comment on the best way to separate a string into words
Re: the best way to separate a string into words
by jeffa (Chancellor) on Jan 19, 2009 at 19:16 UTC

      I'd go with the \W+ ... but I guess it all depends on how you define a word.

      -derby

        I might too, but the OP defined words as "separated by space characters"

        jeffa

        L-LL-L--L-LL-L--L-LL-L--
        -R--R-RR-R--R-RR-R--R-RR
        B--B--B--B--B--B--B--B--
        H---H---H---H---H---H---
        (the triplet paradiddle with high-hat)
        
      It'd probably be best to split using the magical ' '.
      my $string = " a b c "; # note leading/trailing whitespaces my @array1 = split /\s+/, $string; # returns '', 'a', 'b', 'c' (4 item +s) my @array2 = split ' ', $string; # returns 'a', 'b', 'c' (3 item +s)
Re: the best way to separate a string into words
by borisz (Canon) on Jan 19, 2009 at 19:19 UTC
    split is the most easy way.
    Boris
Re: the best way to separate a string into words
by GrandFather (Cardinal) on Jan 19, 2009 at 19:46 UTC

    If you are natural language processing you will find the Lingua modules very useful.


    Perl's payment curve coincides with its learning curve.
Re: the best way to separate a string into words
by setebos (Beadle) on Jan 19, 2009 at 20:02 UTC
Reaped: Re: the best way to separate a string into words
by NodeReaper (Curate) on Jan 19, 2009 at 20:02 UTC
Re: the best way to separate a string into words
by Tanktalus (Canon) on Jan 20, 2009 at 00:13 UTC

    Generally, I use Text::ParseWords. But that's because I like the option for the user to pass in word "some phrase" word to get only three "words" out of it, basically allowing a form of escaping on spaces that makes sense. This isn't a whole lot more complicated than using split ' ', $string, but offers huge amounts of extra flexibility. Whether you do this or not is dependant on whether you want that flexibility or not.

Re: the best way to separate a string into words (contractions)
by tye (Cardinal) on Jan 20, 2009 at 02:11 UTC

    For one definition of "split" and "words", I'd use:

    my @words= $string =~ /(\w+(?:'\w+)*)/g;

    Which would give you words like qw( split and words I'd use ) not like qw( "split" and "words", ) nor like qw( words I d use ).

    Update: Or even, allow hyphenated-word capturing:

    my @words= $string =~ /(\w+(?:[-']\w+)*)/g;

    - tye        

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://737371]
Approved by Bloodnok
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (13)
As of 2014-09-02 12:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (22 votes), past polls