Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

How to improve the accuracy of Lingua::NamedEntity ?

by Anonymous Monk
on Jun 04, 2006 at 22:51 UTC ( #553519=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Rencently I am working with Lingua::NamedEntity which is a modules in CPAN . Named entities" is the NLP jargon for proper nouns which represent people, places, organisations, and so on. This module provides a very simple way of extracting these from a text. If we run the extract_entities routine on a piece of news coverage of recent UK political events, we should expect to see it return a list of hash references looking like this:
{ entity => 'Mr Howard', class => 'person', scores => { ... }, }, { entity => 'Ministry of Defence', class => 'organisation', ... }, { entity => 'Oxfordshire', class => 'place', ... },
but i find that the result is not very well,especially when the module read a text having "Monday,Tuesday "and so on it will regard these words as person,who can help me thankyou very much.

Updated to preserve formatting, improve title, by janitor tye

Replies are listed 'Best First'.
Re: How to improve the accuracy of Lingua::NamedEntity ? (was: How can i do it ??)
by McDarren (Abbot) on Jun 04, 2006 at 23:09 UTC
    I have a vague idea of what you are asking for, but instead of second-guessing, I'll offer some friendly advice:

    1. Please read How (Not) To Ask A Question
    2. Please wrap any code in <code> tags when posting.
    3. Please read How (Not) To Ask A Question
    4. Please post a sample of your current code that isn't giving the desired result.
    5. Please read How (Not) To Ask A Question
    6. Please ask a specific question, if you have one (I know what I mean. Why don't you?)
    7. Finally, please read How (Not) To Ask A Question

      Update: Oh, and:

    8. Please post questions in Seekers of Perl Wisdom, not Perl Monks Discussion
    9. Please try to use a meaningful title for your question.
    10. Please have a read of The Perl Monks Guide to the Monastery

    Update 2:

    Okay, having said all the above - you could simply do something like this:

    #!/usr/bin/perl -w use strict; use Lingua::EN::NamedEntity; $/ = undef; my $text = <DATA>; my @entities = extract_entities($text); my @unwanted_entities = qw( Monday Tuesday Wednesday Thursday Friday S +aturday Sunday); for (@entities) { my $entity = ${$_}{entity}; if ( grep { $_ eq $entity } @unwanted_entities ) { print "Skipping unwanted entity: $entity\n"; } else { print "Valid entity: $entity\n" } }

    And the data for the above code was taken from a "recent BBC News story". The output is as follows:

    Valid entity: Mr Murakami Valid entity: Takafumi Horie Valid entity: Singapore Valid entity: Mr Horie Valid entity: Societe General Asset Management Valid entity: Asset Management Skipping unwanted entity: Friday Valid entity: Tokyo Stock Exchange Valid entity: Livedoor Valid entity: Yoshiaki Murakami Valid entity: Murakami Valid entity: Tokyo Valid entity: International Trade and Industry Ministry Valid entity: Akio Yoshino

    Darren :)

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://553519]
Front-paged by tye
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2022-05-24 15:58 GMT
Find Nodes?
    Voting Booth?
    Do you prefer to work remotely?

    Results (84 votes). Check out past polls.