Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: detecting the language of a word?

by mooseboy (Pilgrim)
on Dec 06, 2002 at 20:27 UTC ( #218144=note: print w/replies, xml ) Need Help??


in reply to detecting the language of a word?

Hmm... this is decidedly non-trivial. One extra thing to bear in mind is that the German spoken in Austria (where I happen to live) is very different from the German spoken in Germany. In addition, there are lots of regional dialects within Austria, so pretty much any 'standard' German word list that you might use will likely not have whatever Austrian dialect words might be in the original German. That being the case, you'll probably need a supplementary list of the dialect terms, at the very least.

If you can tell us which city it is and/or give us a URL, I might be able to offer some further pointers, but it's pretty much inevitable that whatever approach you adopt will be laden with traps for the unwary. Good luck anyway!

Cheers, mooseboy

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://218144]
help
Chatterbox?
[LanX]: before digging into deep debugging ... I have a strange UTF8 problem, probably it rings a bell:
[LanX]: two utf8 strings from different sources are base64 encoded, but after joining both the umlauts in teh second get deleted
[Corion]: LanX: You can't just join two base64 strings together
[LanX]: (not a high priority bug because I can use some HTML entities in the second string)
[Corion]: base64 is padded to a multiple of 4 chars (or something)
[LanX]: misunderstanding, I joined them before converting to base64
[Corion]: Also, I would be wary of encodings and try to make really sure that both input strings are UTF-8. Maybe join the input strings from one source together to see whether they decode as bad or not
[Corion]: LanX: Then the problem should persist without encoding to base64 too ;)

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2017-01-16 13:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you watch meteor showers?




    Results (150 votes). Check out past polls.