Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Word Frequency in Particular Sentences

by papidave (Monk)
on Mar 28, 2008 at 11:54 UTC ( #676953=note: print w/ replies, xml ) Need Help??


in reply to Re: Word Frequency in Particular Sentences
in thread Word Frequency in Particular Sentences

swampyankee++ for noticing the problem with abbreviations. Short of a the ability to parse and comprehend grammar, it's going to be very difficult to separate

"We sold the division to MegaTech, Ltd. in Asia last week, who flipped the sale to someone else."
from
"We sold the division to MegaTech Industries. In Asia last week, they flipped the sale to someone else."
other than the fact that we are supposed to start a new sentence with an upper-case letter. There may be examples where that following word is a proper noun, however -- in which case it's going to be a very hard nut to crack.

If, however, you only care about the "typical" case (because this is going to be a one-shot tool), you could:

  1. Split the text on /[.]\s+[A-Z]/ to get sentences.
  2. Grep the text for /[aA]sia/, or for /Asia\s/ if you don't want the word "asian" to count.
  3. Split the sentences that pass on ' ' to get words.
  4. Use the words you get from that split as keys to a hash, and increment a count in each bin.
Q.E.D.


Comment on Re^2: Word Frequency in Particular Sentences
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://676953]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2014-07-14 07:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (255 votes), past polls