Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^3: How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?

by dstrom (Initiate)
on Jun 06, 2011 at 19:14 UTC ( #908354=note: print w/ replies, xml ) Need Help??


in reply to Re^2: How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?
in thread How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?

I am very grateful for your help. However, when I run your program, I do not get the character '中', but rather the non-sensical "Σ╕". Any idea of what is going wrong? Thanks. (I have the East Asian language pack installed, so it is not simply that.)


Comment on Re^3: How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?
Re^4: How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?
by andal (Friar) on Jun 07, 2011 at 08:54 UTC

    What you see in the output depends on your terminal and few other things like locale and font. Perl can only take sequence of bytes and manipulate it. It is the work of some other program to draw a character on the screen that corresponds to that sequence of bytes. Assuming that D6 D0 is sequence for character 中 in GB-2312 encoding, the program produces sequence of bytes for the same character in UTF-8 encoding. If your terminal expects characters in GB-2312 encoding, then it won't display correct character. For example, my terminal expects text in UTF-8 encoding, so everything is displayed correctly.

    I can not know what your terminal expects, but if the following simple program produces correct output, then your terminal expects GB-2312 encoding for the text.

    perl -e 'print pack("H*", "D6D0"), "\n"'

Re^4: How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?
by dstrom (Initiate) on Jun 07, 2011 at 09:05 UTC

    Sorry, my last post was unclear. I meant the program by grantm. Here is a clarified version:

    I am very grateful for your help. However, when I run the program below by grantm, I do not get the character '中', but rather the non-sensical "Σ╕". Any idea of what is going wrong? Thanks. (I have the East Asian language pack installed, so it is not simply that.)

    #!/usr/bin/perl use strict; use warnings; use Encode qw(decode); my $ascii_hex = 'D6D0'; # continue for as many bytes as required my $bytes = pack('H*', $ascii_hex); my $character_string = decode('gb2312', $bytes); binmode(STDOUT, ':utf8'); print $character_string, "\n";
      I do not get the character .... Any idea of what is going wrong?

      Your thing responsible for drawing characters doesn't understand utf8.

      Neither does mine :) but od -tacx1 shows the correct bytes

      0000000   d   8   -  cr  nl
                Σ   ╕     \r  \n
              e4 b8 ad 0d 0a
      0000005
      

      My example script simply printed out UTF-8 characters on STDOUT. In my case I ran the script in GNOME terminal under Linux and the correct character was displayed. Under Linux, your $LANG environment variable would typically need to be set to a UTF-8 locale for this to work (mine is set to "en_NZ.UTF-8"). If you were trying to run the script on Windows, it is very unlikely the command window will correctly display UTF-8.

      But printing to STDOUT is only one option for output. You might chose to write to a file or a database or even a network socket. If you write to a file, you should be able to open and view the file in a web browser - all web browsers understand UTF-8.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://908354]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (11)
As of 2014-12-19 10:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (78 votes), past polls