Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

How to improve the accuracy of Lingua::NamedEntity ?

by Anonymous Monk
on Jun 05, 2006 at 02:51 UTC ( #553519=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Rencently I am working with Lingua::NamedEntity which is a modules in CPAN . Named entities" is the NLP jargon for proper nouns which represent people, places, organisations, and so on. This module provides a very simple way of extracting these from a text. If we run the extract_entities routine on a piece of news coverage of recent UK political events, we should expect to see it return a list of hash references looking like this:
{ entity => 'Mr Howard', class => 'person', scores => { ... }, }, { entity => 'Ministry of Defence', class => 'organisation', ... }, { entity => 'Oxfordshire', class => 'place', ... },
but i find that the result is not very well,especially when the module read a text having "Monday,Tuesday "and so on it will regard these words as person,who can help me thankyou very much.

http://search.cpan.org/~ambs/Lingua-EN-NamedEntity-1.6/NamedEntity.pm

Updated to preserve formatting, improve title, by janitor tye

Comment on How to improve the accuracy of Lingua::NamedEntity ?
Download Code
Re: How to improve the accuracy of Lingua::NamedEntity ? (was: How can i do it ??)
by McDarren (Abbot) on Jun 05, 2006 at 03:09 UTC
    I have a vague idea of what you are asking for, but instead of second-guessing, I'll offer some friendly advice:

    1. Please read How (Not) To Ask A Question
    2. Please wrap any code in <code> tags when posting.
    3. Please read How (Not) To Ask A Question
    4. Please post a sample of your current code that isn't giving the desired result.
    5. Please read How (Not) To Ask A Question
    6. Please ask a specific question, if you have one (I know what I mean. Why don't you?)
    7. Finally, please read How (Not) To Ask A Question

      Update: Oh, and:

    8. Please post questions in Seekers of Perl Wisdom, not Perl Monks Discussion
    9. Please try to use a meaningful title for your question.
    10. Please have a read of The Perl Monks Guide to the Monastery

    Update 2:

    Okay, having said all the above - you could simply do something like this:

    #!/usr/bin/perl -w use strict; use Lingua::EN::NamedEntity; $/ = undef; my $text = <DATA>; my @entities = extract_entities($text); my @unwanted_entities = qw( Monday Tuesday Wednesday Thursday Friday S +aturday Sunday); for (@entities) { my $entity = ${$_}{entity}; if ( grep { $_ eq $entity } @unwanted_entities ) { print "Skipping unwanted entity: $entity\n"; } else { print "Valid entity: $entity\n" } }

    And the data for the above code was taken from a "recent BBC News story". The output is as follows:

    Valid entity: Mr Murakami Valid entity: Takafumi Horie Valid entity: Singapore Valid entity: Mr Horie Valid entity: Societe General Asset Management Valid entity: Asset Management Skipping unwanted entity: Friday Valid entity: Tokyo Stock Exchange Valid entity: Livedoor Valid entity: Yoshiaki Murakami Valid entity: Murakami Valid entity: Tokyo Valid entity: International Trade and Industry Ministry Valid entity: Akio Yoshino

    Cheers,
    Darren :)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://553519]
Front-paged by tye
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (12)
As of 2014-09-22 12:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (191 votes), past polls