Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: extracting data from Lingua::Named::Entity

by Corion (Patriarch)
on Jul 05, 2018 at 09:17 UTC ( [id://1217938]=note: print w/replies, xml ) Need Help??


in reply to extracting data from Lingua::Named::Entity

You get at the information from the entries by accessing the array and then the fields of the hash (perldsc and/or tye's References Quick Reference):

print $entities[0]->{ class }; print $entities[0]->{ entity }}

But looking at the output, it seems that the module doesn't really understand the "ñ" in El Niño, as it suggests "El Ni" (from "El Nino Southern Oscillation" as an entity:

$VAR1 = { 'count' => 1, 'scores' => { 'person' => 4, 'place' => 1, 'organisation' => 3 }, 'entity' => 'El Ni', 'class' => 'person' };

I think as a quick (rough) fix, you can use Text::Unidecode to downgrade all your text to ASCII, removing all adornments. This would canonicalize different writings, for example "El Nino" and "El Niño", to "El Nino".

The alternative fix would be to teach Lingua::EN::NamedEntity about Unicode, and/or to supply properly decoded data to it. I haven't looked into what is necessary for that.

Replies are listed 'Best First'.
Re^2: extracting data from Lingua::Named::Entity
by rahulruns (Scribe) on Jul 05, 2018 at 09:41 UTC

    Thanks alot that works. For unicode will be looking into it

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1217938]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2024-03-28 21:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found