Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re: testing parts of a string against a word database

by TomDLux (Vicar)
on Dec 01, 2011 at 00:33 UTC ( #940975=note: print w/replies, xml ) Need Help??

in reply to testing parts of a string against a word database

I normally complain about people using features like regex when simpler mechanisms are available. In this case, I think you are over-simplifying, with substr(), when you could batch process. But I see you are collecting the punctuation you see, at the top, although you don't do anything with it ... maybe that's a bit of code you cleared away as not relevant to the problem.

What I would consider is merging the punctuation regex with splitting the line into words, using split to partition on non-word characters ... that is, not alpha, not numeric, not underscore. If that's too generous, you can be more specific.

my @words = split /\W/, $sen;

Also, how many NOUNS are you dealing with? If it's only a few million, I would read it into a hash, and check each word against the hash. Reading the file dozens, hundreds or thousands of times, is ghastly slow. A few megabytes for the hash is not excessively painful. Maybe you can save a copy of nouns.txt split into one word per line ... or save it as a YAML file or some other format that loads quickly as a Perl data structure.

As Occam said: Entia non sunt multiplicanda praeter necessitatem.

Replies are listed 'Best First'.
Re^2: testing parts of a string against a word database
by Rudolf (Pilgrim) on Dec 01, 2011 at 01:45 UTC

    I'm learning a lot from your help, much appreciated Tom!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://940975]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (10)
As of 2016-10-24 10:17 GMT
Find Nodes?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?

    Results (304 votes). Check out past polls.