Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Idiom guessing script

by Albannach (Prior)
on Nov 21, 2005 at 04:07 UTC ( #510353=note: print w/ replies, xml ) Need Help??


in reply to Idiom guessing script

It does not sound like a simple problem, because you are not dealing with much data upon which to base your decision. It strikes me that it may be possible to choose just a few hundred words from each potential language, words that are both commonly used and relatively unique to that tongue. However even this may not work for something like book titles which are not necessarily common usage (in English at least). If you could get large word lists for different languages (perhaps take a sample from some major newspapers?) you could build your own such list of 'indicator words'. I would not keep the langages separate, but have each word in the list tagged as to what language(s) it suggests, then you could sort of take a poll of your title's words to get a guess as to the language used.

On the chance that you are actually talking about book titles, perhaps it would help you to know that the ISBN issued for every book published starts with a code called the Group Identifier. While this is not necessarily a reliable indicator of the language, it may be of some use, perhaps to verify a language-based determination, or to help you select what language(s) to test against.

--
I'd like to be able to assign to an luser


Comment on Re: Idiom guessing script

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://510353]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2014-07-13 14:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (249 votes), past polls