Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

help needed in unicode displaying

by uva (Sexton)
on Mar 10, 2006 at 07:07 UTC ( [id://535623]=perlquestion: print w/replies, xml ) Need Help??

uva has asked for the wisdom of the Perl Monks concerning the following question:

hai monks , how to print the following text in dos prompt . i need to execute the following statement in dos prompt. the required result is not giving. the following print statement contains chinese characters.
print "负担过重";

Replies are listed 'Best First'.
Re: help needed in unicode displaying
by graff (Chancellor) on Mar 10, 2006 at 07:49 UTC
    I'm only guessing here, but if you have a Chinese-enabled version of ms-windows, it might be the case that the dos-prompt window uses cp936 (Extended GB) for Chinese characters, or whatever cp??? applies to Big5 (Traditional) Chinese, rather than using unicode.

    For that matter, if the dos-prompt window is "unicode-enabled", you might need to use UTF-16LE rather than utf8. You'd have to see whether the so-called "Help" or alleged documentation for that OS can give you any guidance on whether the dos-prompt window supports Chinese characters at all, and if so, what specific encoding is expected.

    Assuming it is possible, and you can find out what character set to use, Encode and PerlIO are your friends -- you can create a perl-internal utf8 string like this:

    my $utf8 = join( "", map { chr() } ( 36127, 25285, 36807, 37325 ));
    and then either use Encode::encode() to convert it to something besides utf8 (if necessary), or simply use  binmode STDOUT, ":encoding(cp936)"; (use alternate character encoding name as needed) so that perl converts the string into the expected character set on output (see perlunicode and perluniintro).

    (If it turns out that the dos-prompt window wants utf8 data, just do  binmode STDOUT, ":utf8"; so that perl knows you want output utf8 data.)

    (updated to fix missing close-paren in code snippet)

      dear monks, i saw the posts in "help needed in unicode displaying" and i tried to do some program.i came across the following problem.In the $target variable, I stored some chinese characters (for example three chinese characters), I tried to get first two chinese characters using substr $target,0 2 ; , but its not giving the appropriate answer. any one give me the solution to retrieve the first two chinese characters,
        How about if you show us just enough code to demonstrate the problem -- i.e. assign characters to a scalar, apply substr to the scalar, print the result in some way -- and show us what you actually get as a result. (In the little snippet you showed, there should be a comma between the 0 and the 2.)

        For example, this should do what you intended:

        my $target = join("", map {chr()} ( 0x5434, 0x9547, 0x5b87 )); my $part = substr( $target, 0, 2 ); print " length of target = ", length($target); print "\n length of substr = ", length( $part ); print "\n $target\n $part\n";
        The length function should return the character counts (3 for $target, 2 for $part). If you are using a utf8-aware display window, you should see the two strings in Chinese characters (or redirect to a file, and view that in a utf8-capable display tool, like a browser).

        If you get something different, tell us what OS and perl version you have, and be specific about what you actually got.

Re: help needed in unicode displaying
by Corion (Patriarch) on Mar 10, 2006 at 07:28 UTC

    I'm not sure if the Windows console understands UTF-8 unicode. If it does, and you have the proper font installed, you should be able to output UTF-8 to it, either directly, or by reopening STDOUT with

    close STDOUT; open STDOUT, ">:utf8", "" or die "Couldn't reopen STDOUT: $!";

    But in any case, the Windows console will not understand and decode HTML entities. You will need to use Encode to decode your HTML entities and encode them as Unicode before outputting them.

    Update: See graff's reply below - I think it's more likely that Windows wants UTF-16LE if the console is Unicode-enabled, and otherwise, you should output in the current codepage of the console.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://535623]
Approved by wfsp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2025-01-14 05:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Which URL do you most often use to access this site?












    Results (38 votes). Check out past polls.