Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: Idiom guessing script

by Albannach (Prior)
on Nov 21, 2005 at 04:07 UTC ( #510353=note: print w/replies, xml ) Need Help??

in reply to Idiom guessing script

It does not sound like a simple problem, because you are not dealing with much data upon which to base your decision. It strikes me that it may be possible to choose just a few hundred words from each potential language, words that are both commonly used and relatively unique to that tongue. However even this may not work for something like book titles which are not necessarily common usage (in English at least). If you could get large word lists for different languages (perhaps take a sample from some major newspapers?) you could build your own such list of 'indicator words'. I would not keep the langages separate, but have each word in the list tagged as to what language(s) it suggests, then you could sort of take a poll of your title's words to get a guess as to the language used.

On the chance that you are actually talking about book titles, perhaps it would help you to know that the ISBN issued for every book published starts with a code called the Group Identifier. While this is not necessarily a reliable indicator of the language, it may be of some use, perhaps to verify a language-based determination, or to help you select what language(s) to test against.

I'd like to be able to assign to an luser

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://510353]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2018-06-23 05:18 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.