I'm only guessing here, but if you have a Chinese-enabled version of ms-windows, it might be the case that the dos-prompt window uses cp936 (Extended GB) for Chinese characters, or whatever cp??? applies to Big5 (Traditional) Chinese, rather than using unicode.
For that matter, if the dos-prompt window is "unicode-enabled", you might need to use UTF-16LE rather than utf8. You'd have to see whether the so-called "Help" or alleged documentation for that OS can give you any guidance on whether the dos-prompt window supports Chinese characters at all, and if so, what specific encoding is expected.
Assuming it is possible, and you can find out what character set to use, Encode and PerlIO are your friends -- you can create a perl-internal utf8 string like this:
my $utf8 = join( "", map { chr() } ( 36127, 25285, 36807, 37325 ));
and then either use Encode::encode() to convert it to something besides utf8 (if necessary), or simply use binmode STDOUT, ":encoding(cp936)"; (use alternate character encoding name as needed) so that perl converts the string into the expected character set on output (see perlunicode and perluniintro).
(If it turns out that the dos-prompt window wants utf8 data, just do binmode STDOUT, ":utf8"; so that perl knows you want output utf8 data.)
(updated to fix missing close-paren in code snippet) | [reply] [d/l] [select] |
dear monks,
i saw the posts in "help needed in unicode displaying" and i tried to do some program.i came across the following problem.In the $target variable, I stored some chinese characters (for example three chinese characters), I tried to get first two chinese characters using substr $target,0 2 ; , but its not giving the appropriate answer. any one give me the solution to retrieve the first two chinese characters,
| [reply] [d/l] |
my $target = join("", map {chr()} ( 0x5434, 0x9547, 0x5b87 ));
my $part = substr( $target, 0, 2 );
print " length of target = ", length($target);
print "\n length of substr = ", length( $part );
print "\n $target\n $part\n";
The length function should return the character counts (3 for $target, 2 for $part). If you are using a utf8-aware display window, you should see the two strings in Chinese characters (or redirect to a file, and view that in a utf8-capable display tool, like a browser).
If you get something different, tell us what OS and perl version you have, and be specific about what you actually got. | [reply] [d/l] |
I'm not sure if the Windows console understands UTF-8 unicode. If it does, and you have the proper font installed, you should be able to output UTF-8 to it, either directly, or by reopening STDOUT with
close STDOUT;
open STDOUT, ">:utf8", ""
or die "Couldn't reopen STDOUT: $!";
But in any case, the Windows console will not understand and decode HTML entities. You will need to use Encode to decode your HTML entities and encode them as Unicode before outputting them.
Update: See graff's reply below - I think it's more likely that Windows wants UTF-16LE if the console is Unicode-enabled, and otherwise, you should output in the current codepage of the console. | [reply] [d/l] |