How to improve the accuracy of Lingua::NamedEntity ?

by Anonymous Monk
Rencently I am working with Lingua::NamedEntity which is a modules in CPAN . Named entities" is the NLP jargon for proper nouns which represent people, places, organisations, and so on. This module provides a very simple way of extracting these from a text. If we run the extract_entities routine on a piece of news coverage of recent UK political events, we should expect to see it return a list of hash references looking like this:
{ entity => 'Mr Howard', class => 'person', scores => { ... }, }, { entity => 'Ministry of Defence', class => 'organisation', ... }, { entity => 'Oxfordshire', class => 'place', ... },
but i find that the result is not very well,especially when the module read a text having "Monday,Tuesday "and so on it will regard these words as person,who can help me thankyou very much.

    I have a vague idea of what you are asking for, but instead of second-guessing, I'll offer some friendly advice:

    2. Please wrap any code in <code> tags when posting.
    4. Please post a sample of your current code that isn't giving the desired result.
    6. Please ask a specific question, if you have one (I know what I mean. Why don't you?)
    Okay, having said all the above - you could simply do something like this:

    #!/usr/bin/perl -w use strict; use Lingua::EN::NamedEntity; $/ = undef; my $text = <DATA>; my @entities = extract_entities($text); my @unwanted_entities = qw( Monday Tuesday Wednesday Thursday Friday S +aturday Sunday); for (@entities) { my $entity = ${$_}{entity}; if ( grep { $_ eq $entity } @unwanted_entities ) { print "Skipping unwanted entity: $entity\n"; } else { print "Valid entity: $entity\n" } }

    And the data for the above code was taken from a "recent BBC News story". The output is as follows:

    Valid entity: Mr Murakami Valid entity: Takafumi Horie Valid entity: Singapore Valid entity: Mr Horie Valid entity: Societe General Asset Management Valid entity: Asset Management Skipping unwanted entity: Friday Valid entity: Tokyo Stock Exchange Valid entity: Livedoor Valid entity: Yoshiaki Murakami Valid entity: Murakami Valid entity: Tokyo Valid entity: International Trade and Industry Ministry Valid entity: Akio Yoshino

    Darren :)

