Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: compute the occurrence of words

by vinoth.ree (Monsignor)
on Feb 13, 2013 at 14:05 UTC ( #1018541=note: print w/replies, xml ) Need Help??

in reply to compute the occurrence of words

Also at the moment the code is returning numeric values which I need to exclude.

Then what you expect from this code? It gives the word and its count.

Replies are listed 'Best First'.
Re^2: compute the occurrence of words
by BigGer (Novice) on Feb 13, 2013 at 14:17 UTC

    the line  $data = <FH>; Is an error and I have removed it. I am looking to count the occurrences of each word used in a document but excluding numbers. Hope that clarifies my question. G

      In which case you will also have to define "numbers" :) integers?, floats? e-notation? Roman? Only ASCII-digits, or also other Unicode numerals?

      Let me assume simple integers and floats represented in ASCII (no triad-sep, radix-sep = '.', so valid numbers include 1234 and 0.23, but not DCVII, 2.34e12 or 1,234,567.00

      my %count; while (<FH>) { $count{lc $_}++ for grep { !m{^[0-9]+(\.[0-9]+)?$} } m/\w+/g; }

      For a complete regular expression to integers and reals, I'd like to refer to Regexp::Common (see $RE{num}).

      update: /me just realized that it is overly complex, as \w+ can only match integers without a triad-sep, as . is not included in \w, reducing the loop-line to

      $count{lc $_}++ for grep { !m{^[0-9]+$} } m/^\w+$/g;

      Enjoy, Have FUN! H.Merijn

        Thanks H.Merijn That's perfect. I will go and read up on the hash function. G

      ... count ... but excluding numbers.

      This just confuses me. Can you provide a small input list of words and a corresponding output list showing the non-numeric 'count' you desire for the given input?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1018541]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2018-05-25 23:30 GMT
Find Nodes?
    Voting Booth?