I am looking to count the occurrences of each word used in a document but excluding numbers. Hope that clarifies my question. G
In which case you will also have to define "numbers" :) integers?, floats? e-notation? Roman? Only ASCII-digits, or also other Unicode numerals?
Let me assume simple integers and floats represented in ASCII (no triad-sep, radix-sep = '`.`', so valid numbers include `1234` and `0.23`, but not `DCVII`, `2.34e12` or `1,234,567.00`
`my %count;
while (<FH>) {
$count{lc $_}++ for grep { !m{^[0-9]+(\.[0-9]+)?$} } m/\w+/g;
}
`
For a complete regular expression to integers and reals, I'd like to refer to Regexp::Common (see `$RE{num}`).
**update**: /me just realized that it is overly complex, as `\w+` can **only** match integers without a triad-sep, as `.` is not included in `\w`, reducing the loop-line to
`$count{lc $_}++ for grep { !m{^[0-9]+$} } m/^\w+$/g;
`
Enjoy, Have FUN! H.Merijn
