in reply to Re^2: Perl & Unicode: state of the art?
in thread Perl & Unicode: state of the art?
> can the language be determined?
You know the answer, only with statistical certainty and dependent on the length of the text and the distance of languages.
Hand and finger (en) <=> Hand und Finger (de)
If same script lead to same delimiters can only be answered by someone knowing all 6000 languages of the world.
But already Arabic words should be a problem, maybe less if transcribed. Chinese even more.
see also Word_divider and Word#Word_boundaries
Cheers Rolf
( addicted to the Perl Programming Language)
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^4: Perl & Unicode: state of the art?
by BrowserUk (Patriarch) on Oct 08, 2013 at 02:16 UTC | |
by Discipulus (Canon) on Oct 08, 2013 at 07:32 UTC |
In Section
Seekers of Perl Wisdom