Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Optimising Lingua::EN::NamedEntity for Very Strings

by EdwardG (Vicar)
on May 10, 2006 at 14:19 UTC ( #548468=note: print w/replies, xml ) Need Help??


in reply to Optimising Lingua::EN::NamedEntity for Very Strings

       Any suggestions so I can send the maintainer a patch?

Perhaps parameterisation, as in

use Lingua::EN::NamedEntity; my @entities = extract_entities($some_text, $max_string_length);
or filter the output (but not solve your problem)
my @entities = extract_entities($some_text, $max_entity_length);

A reasonable default for either option might be 92 characters, which would accomodate a variant spelling of the name of a hill in my country of origin;

Tetaumatawhakatangihangakoauaotamateaurehaeaturipukapihimaungahoronukupokaiwhenuaakitanarahu (link goes to image).

 

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://548468]
help
Chatterbox?
[Corion]: marto: Hmm - that's an interesting idea, looking at the VBI data, but I would expect my input to come over IP :)
[Corion]: Random googling showed me CommercialDetectio n, which discusses several approaches and seems to settle on the audio...
[LanX]: no big help, tv is slaughtering many movies in ways which are not fixable
[Corion]: But maybe I should try the idea of audio fingerprinting there, just to automatically annotate radio broadcasts I record... These don't have playlist information integrated as they are raw mp3 streams (no icecast)...
[marto]: also, if your steam has the subtitle data, I that's fairly easily parsable
[marto]: for example :P

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (12)
As of 2018-05-24 11:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?