Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Recognizing parts of speech

by justinNEE (Monk)
on May 30, 2003 at 04:39 UTC ( [id://261762]=perlquestion: print w/replies, xml ) Need Help??

justinNEE has asked for the wisdom of the Perl Monks concerning the following question:

The background:
We are trying to score text passages. We have a list of adjectives and the scores associated with each one. So if "tired" is a level 1 adjective and "hungry" is a level 2 adjective, the setence "I am tired and hungry" would recieve the score of "3".

Complication:
There are actually 2 scores per text passage. a "self" score and "other" score. So all the points from adjectives refering to self goes to the self score, while adjectives refering to the other person goes ot the other score. So the sentence:
"I am tired and he is hungry"
is scored:
self=1 , other=2

The problem:
We would like to hear about any algorithms/work/modules/gossip related to how one would go about determining which adjectives belong with "self" or "other", keeping in mind that we already have a list of all the acceptable adjectives.

Thank you monks for your advice.

Replies are listed 'Best First'.
Re: Recognizing parts of speech
by grep (Monsignor) on May 30, 2003 at 05:33 UTC
    Lingua::LinkParser should be able to get you there.
    use Lingua::LinkParser; our $parser = new Lingua::LinkParser; my $sentence = $parser->create_sentence("I am tired and he is hungry." +); my @linkages = $sentence->linkages; foreach $linkage (@linkages) { print ($parser->get_diagram($linkage)); }
    Will get you a parse diagram like:
    +---------------------Xp--------------------+ | +-------CC-------+ | +--Wd--+-SX-+--Pa-+ +Wdc+-Ss+--Pa--+ | | | | | | | | | | LEFT-WALL I.p am.v tired.a and he is.v hungry.a .
    Now the method get_diagram is for human reading but, L::LP does let you at all that programatically. You can determine if each word is a subject, verb, adjective, ... and what it relates to.

    besides the docs for L::LP you can you at my module Acme::Yoda for examples.

    grep
    Mynd you, mønk bites Kan be pretti nasti...

      Is there no end to the useful modules on CPAN? Soon there will be no more wheels to re-invent!

      CountZero

      "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

        They will combine the exisiting modules in different combination to do more powerful work. Think : Number of Combination of any 2 modules out of 1000 modules, is approximately 500000.

        Said that.. new devices and software will always encourage new modules.

        artist

        Unfortunately, natural language parsing still has a lot of reinventing to do. While this module will work with simple sentences, I can promise you that it will break and give you misinformation frequently with real world texts. I work for a company that specializes in natural language processing and I can tell you that there is NO technology out there that will give 100% accurate NLP results in real world texts, and any one that will get close will not be free. That said, if your texts are fairly structured and use simple sentences, a link parser will probably do okay.


        Not so -- there is no limit to what mankind does not know. New wheels will be invented for reinventing.
        -----------------------
        You are what you think.
Re: Recognizing parts of speech
by graff (Chancellor) on May 31, 2003 at 04:05 UTC
    How do you score sentences like "He and I are hungry", "I'm not hungry, but she is", "No one is as tired as I am", etc? (Or do you have some way of making sure that all your sentences follow some limited set of simple syntactic frames?)

    I'm not asking for the sake of figuring out what sort of algorithm will address the problem. My point is simply to demonstrate why the skepticism cited by an Anonymous Monk elsewhere in this thread is well-deserved. Even if your plans for scoring have principled answers for things like conjoined head nouns, negation, empty trace slots, noun phrases referring to non-entities, etc, building a parser that can associate adjectives with noun phrases the same way people do is a science that is still in its infancy.

    (A handful of NLP researchers have been moving it into "adolescence" -- you can check some papers by Eugene Charniak about automatic parsers, but I don't know about availability of source code. You can also check the CORPORA listserv archives for information on open-source or otherwise free parsers.)

    I have not tried Lingua::LinkParser, so I don't know what it would do on my examples, or whether its output would meet your needs on such examples. If you have the time, it's worth a try, I'm sure. But if it's important to get the scoring done reasonably well in accordance with your designs, have a fall-back plan that optimizes the use of human scorers.

    Sentences that contain none of your listed adjectives can be scored automatically; those that contain one or more adjectives and only one pronoun (and not much else) should also be easy to automate. Those that have one or more adjectives and two or more pronouns or other noun phrases need to be reviewed manually, whether or not you choose to hypothesize a score with a perl script.

      It realy depends on what you want to do with the scores, ie how accurate is good enough. for a rough, but still fairly usable scoring system you could just use averages instead, ie how many times "I" appears in text vs how many times "other" pronouns appear and multiply each part of this ratio by the average counted scores for the adjectives. well, actually, just the ratio figure will suffice for some kind of result.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://261762]
Approved by rob_au
Front-paged by Enlil
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-19 20:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found