Your skill will accomplish what the force of many cannot |
|
PerlMonks |
Re: extracting data from Lingua::Named::Entityby Corion (Patriarch) |
on Jul 05, 2018 at 09:17 UTC ( [id://1217938]=note: print w/replies, xml ) | Need Help?? |
You get at the information from the entries by accessing the array and then the fields of the hash (perldsc and/or tye's References Quick Reference):
But looking at the output, it seems that the module doesn't really understand the "ñ" in El Niño, as it suggests "El Ni" (from "El Nino Southern Oscillation" as an entity:
I think as a quick (rough) fix, you can use Text::Unidecode to downgrade all your text to ASCII, removing all adornments. This would canonicalize different writings, for example "El Nino" and "El Niño", to "El Nino". The alternative fix would be to teach Lingua::EN::NamedEntity about Unicode, and/or to supply properly decoded data to it. I haven't looked into what is necessary for that.
In Section
Seekers of Perl Wisdom
|
|