Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic

Re: detecting the language of a word?

by pike (Monk)
on Dec 09, 2002 at 10:20 UTC ( #218490=note: print w/replies, xml ) Need Help??

in reply to detecting the language of a word?

I think what you need is not a regular word list but a pronunciation dictionary - that is, one that lists the pronunciation for each word (form). If you check this, you are basicly left with two cases:

• if the pronunciation follows general german pronunciation rules, then the word is either german, or at least the text-to-speech converter will pronounce it correctly, so you don't need to mark it.

• if the pronunciation violates german pronunciation rules, the word is probably foreign - and then you can check with a dictionary of the corresponding language (see below).

Pronunciation lexicons have the additional advantage that they list word forms, not words, which eliminates the need for stemming. Of course, this works only because german spelling and its mapping to pronunciation is fairly regular.

For the words you don't find in your pronunciation dictionary, you can look at the transition probabilities of the letters: the probability that letter "x" is followed by "y" is very language specific. If you calculate these probabilities from a large list of words for the languages in question, they provide a good criterion. This has the advantage that you will also be able to classify names - which normally don't appear in dictionaries.

This leaves you only with the words that can be both german and foreign - as e. g. "email". But my guess is that there will be only few of them and you can treat them manually (BTW, the pronunciation dictionary should give you two pronunciations of "email" - one that conforms and one that violates german pronunciation rules - so you should be warned).

You won't get around proofreading (at least samples) anyway. But I hope this will help you to minimize the amount of manual corrections.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://218490]
[arunks]: do you have any sample code .please share it. am looking for before opening excel file prompt should ask for password to open the excel file.
[marto]: which part don't you know how to do?
[marto]: oh
[Corion]: arunks: Maybe How do I make password prompts not echo back the user?? Or simply read it from STDIN.
[Corion]: arunks: Please do not ask us to write your program. You have to write your program yourself.

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (12)
As of 2017-01-20 14:23 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (174 votes). Check out past polls.