http://www.perlmonks.org?node_id=947136


in reply to Re^6: putting text into array word by word
in thread putting text into array word by word

$this_word =~ s/[[:punct:]]//g;

The only problem with that approach is that it removes internal punctuation (ie apostrophes) as well, so that I'll becomes ill, she'd becomes shed, etc. ('Why was Virgina Woolf so obsessed with sheds?' I hear someone ask.)

I'd use this instead:

$this_word =~ s/^[[:punct:]]+//; # Remove leading punct. $this_word =~ s/[[:punct:]]+$//; # Remove trailing punct.

Update: added Virginia Woolf sentence.

Replies are listed 'Best First'.
Re^8: putting text into array word by word
by jms53 (Monk) on Jan 10, 2012 at 19:47 UTC

    hadn't thought about that. Good catch, thanks!

      OK, let's extract another worm from the can we've opened. Say we want to count all the times 'John' appears in a text. Given a sentence:

      Was it John or John's brother?

      Should the count be one or two?

        Actually, that is the next part of what I'm going to do.

        However, since I am only interested in words from the English dictionary, it doesn't matter so much whether John's is counted or not as John. (export to csv, open in spreadsheet).

        I would however be extremely interested in knowing how you would change this, as I will get that kind of issue with contractions (it's, wouldn't, etc. that I had planned on tackling with a spreadsheet, (but then again, I am trying to minimize user interaction)

        Thank you