by ayrnieu (Beadle)
in reply to Replacing Types

First, you need to become a computational linguist.

Articles on arXiv might help; or maybe you could go to college.

I am not a Computational Linguist, but I would start out on this problem by supposing that I am only interested in nouns including famous personages, as in your two examples, and that I can therefore: guess that adjacent capitalized words might be proper names to collect and check against (say:) , and check other words against a dictionary. Or maybe all of your sentences are so simple, and you can more easily consider them as prolog-style predicates. Or maybe I have a small but difficult corpus and can better spend my time on making an interface nice enough that I can farm the markup out to bored humans, ala Amazon's Mechanical Turk.

Or maybe you'll find a nice module under Lingua::EN

