Recognizing parts of speech

justinNEE has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: Recognizing parts of speech
by grep (Monsignor) on May 30, 2003 at 05:33 UTC

Lingua::LinkParser

use Lingua::LinkParser;

our $parser = new Lingua::LinkParser;
my $sentence = $parser->create_sentence("I am tired and he is hungry."
+);
my @linkages = $sentence->linkages;
foreach $linkage (@linkages) {
    print ($parser->get_diagram($linkage));
}
[download]

    +---------------------Xp--------------------+
    |      +-------CC-------+                   |
    +--Wd--+-SX-+--Pa-+     +Wdc+-Ss+--Pa--+    |
    |      |    |     |     |   |   |      |    |
LEFT-WALL I.p am.v tired.a and he is.v hungry.a .
[download]

get_diagram

L::LP

besides the docs for L::LP you can you at my module Acme::Yoda for examples.

grep
Mynd you, mønk bites Kan be pretti nasti...

[reply]
[d/l]
[select]

Re: Re: Recognizing parts of speech

by CountZero (Bishop) on May 30, 2003 at 06:54 UTC

Is there no end to the useful modules on CPAN? Soon there will be no more wheels to re-invent!

CountZero

"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

[reply]

Re: Re: Re: Recognizing parts of speech

by artist (Parson) on May 30, 2003 at 10:49 UTC

Said that.. new devices and software will always encourage new modules.

artist

[reply]

Re: Re: Re: Recognizing parts of speech

by Anonymous Monk on May 30, 2003 at 17:24 UTC

Unfortunately, natural language parsing still has a lot of reinventing to do. While this module will work with simple sentences, I can promise you that it will break and give you misinformation frequently with real world texts. I work for a company that specializes in natural language processing and I can tell you that there is NO technology out there that will give 100% accurate NLP results in real world texts, and any one that will get close will not be free. That said, if your texts are fairly structured and use simple sentences, a link parser will probably do okay.

[reply]

Re: Re: Re: Re: Recognizing parts of speech

by chaoticset (Chaplain) on May 30, 2003 at 19:48 UTC

Re: Re: Re: Recognizing parts of speech

by chaoticset (Chaplain) on May 30, 2003 at 15:45 UTC

[reply]

Re: Recognizing parts of speech
by graff (Chancellor) on May 31, 2003 at 04:05 UTC

I'm not asking for the sake of figuring out what sort of algorithm will address the problem. My point is simply to demonstrate why the skepticism cited by an Anonymous Monk elsewhere in this thread is well-deserved. Even if your plans for scoring have principled answers for things like conjoined head nouns, negation, empty trace slots, noun phrases referring to non-entities, etc, building a parser that can associate adjectives with noun phrases the same way people do is a science that is still in its infancy.

(A handful of NLP researchers have been moving it into "adolescence" -- you can check some papers by Eugene Charniak about automatic parsers, but I don't know about availability of source code. You can also check the CORPORA listserv archives for information on open-source or otherwise free parsers.)

I have not tried Lingua::LinkParser, so I don't know what it would do on my examples, or whether its output would meet your needs on such examples. If you have the time, it's worth a try, I'm sure. But if it's important to get the scoring done reasonably well in accordance with your designs, have a fall-back plan that optimizes the use of human scorers.

Sentences that contain none of your listed adjectives can be scored automatically; those that contain one or more adjectives and only one pronoun (and not much else) should also be easy to automate. Those that have one or more adjectives and two or more pronouns or other noun phrases need to be reviewed manually, whether or not you choose to hypothesize a score with a perl script.

[reply]

Re: Re: Recognizing parts of speech

by aquarium (Curate) on May 31, 2003 at 12:34 UTC

It realy depends on what you want to do with the scores, ie how accurate is good enough. for a rough, but still fairly usable scoring system you could just use averages instead, ie how many times "I" appears in text vs how many times "other" pronouns appear and multiply each part of this ratio by the average counted scores for the adjectives. well, actually, just the ratio figure will suffice for some kind of result.

[reply]


Syntactic Confectionery Delight
	PerlMonks