Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re^4: putting text into array word by word

by jms53 (Monk)
on Jan 09, 2012 at 18:55 UTC ( #947059=note: print w/replies, xml ) Need Help??

in reply to Re^3: putting text into array word by word
in thread putting text into array word by word

PERFECT, Thank you sooo much

I can now go back to calmly pseudo-geeking.

  • Comment on Re^4: putting text into array word by word

Replies are listed 'Best First'.
Re^5: putting text into array word by word
by Not_a_Number (Prior) on Jan 09, 2012 at 20:11 UTC

    OK, now that's solved, let's look at the definition of 'word' (yes, things are going to get hairy...). Take this sentence, for example:

    "No, he said."

    The 'words' that your current code would extract are:

    "No, he said."

    Is that really what you want? Or would you prefer:

    No # or, better, 'no' he said


      during the foreach loop I remove punctuation and set to lowercase (such that No and no are the same word.
      while (<FILE>) { my @these_words = split(' ', $_); foreach my $this_word (@these_words) { $this_word =~ s/[[:punct:]]//g; $this_word = lc($this_word); push @all_words, $this_word; } }
        $this_word =~ s/[[:punct:]]//g;

        The only problem with that approach is that it removes internal punctuation (ie apostrophes) as well, so that I'll becomes ill, she'd becomes shed, etc. ('Why was Virgina Woolf so obsessed with sheds?' I hear someone ask.)

        I'd use this instead:

        $this_word =~ s/^[[:punct:]]+//; # Remove leading punct. $this_word =~ s/[[:punct:]]+$//; # Remove trailing punct.

        Update: added Virginia Woolf sentence.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://947059]
[james28909]: you guys/gals are awesome. thanks for the wisdom.
[Discipulus]: shmem the first time I was in germany I was so smart to learn by heart the address of the hostel: Einbahnstraße..
[Discipulus]: james28909 you are welcome

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (3)
As of 2018-05-20 16:09 GMT
Find Nodes?
    Voting Booth?