http://www.perlmonks.org?node_id=1057331


in reply to Re: Perl & Unicode: state of the art?
in thread Perl & Unicode: state of the art?

Thai and Lao text ... these languages, sentences are generally delimited by whitespace, and individual words are not delimited at all in the text, but instead are delimited by syntactic rules.

So, fair to say that the first requirement to process Unicode 'text'; is to determine the language.

So then the question becomes: given a file of Unicode text; can the language be determined?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.