|Problems? Is your data what you think it is?|
Status of English modules...by darksym (Beadle)
|on Mar 30, 2002 at 07:30 UTC||Need Help??|
darksym has asked for the
wisdom of the Perl Monks concerning the following question:
Hi, monks, yeanling & bantling:
After looking over CPAN and doing some WAITing, I'm wondering if I missed out on the big blinking sign that read "English stuff is here!", or something. Is there a unified module that offers a wide variety of English primitives and transforms for Natural Language Processing? For instance, is there something like Text::English, but more extensive? If not, I may be interested in adding my code to somewhere appropriate in the tree as a starting point. So far no word from the author of Text::English.
I'm doing a small contract that requires some auto-correlation and such...
Text::English::stem has been invaluable. Thanks Martin Porter, implementors, and others! I've also been thinking of taking advantage of some of the lists at http://wordlist.sourceforge.net/ to hammer out some facilities for future English nightmares.
On an unrelated note, did you know that only a few special places on the web have the following word sequence according to Google: "Bring King Ling ring Bing Ding Sing spring swing" (Wow, The Phonosemantics of Nasal-Stop Clusters and other music hits.). Can you think of the longest such a m/[a-z]+ing/ match which presumably will trip up Porter's Stemmer (where length > 5)? The common thing here is that the ugly duckling word isn't a stemmable -ing string where that is suppose to cling unlike the word 'spelling'.
Please help me find wordlists that detail English word relationships or other cool language algorithms (I'm no linguist). Thanks my darlings... (And don't go flinging your dumplings at the poor cageling! =] )
P.S. See: Martin's Official PorterStemmer page,
http://snowball.sourceforge.net/ for more info on stemmers.