detecting the language of a word?

BrowserUk
in reply to detecting the language of a word?

Depending upon the number of foriegn words your looking at, it mght be better to run through your files verifying the words against the dictionary for the predominant language, and flag any that do not show up.

You could write the name of the files to a "pages to check" file, and wrap the words in something glaringly obvious (like the hated <blink> tags :). Then you (or your native language editor person) could look at the suspect words in context and make a decision based on that. Of course that won't help you with words like your example that have meanings in several different languages.

Probably the best way to deal with that is to also flag any words that show up in more than one language dictionary.

I think that if performance is anything of an issue, then you should probably avoid storing your dictionaries in a SQL database. However, using the DBI interface to one of the flat-file databases you can achieve some pretty amazing performance as was prooved to me by grantm in this thread Fast wordlist lookup for game.

