Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?

by kejohm (Hermit)
on Jun 05, 2011 at 22:15 UTC ( #908217=note: print w/ replies, xml ) Need Help??


in reply to How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?

You could use the hex() function to convert the hex strings into their corresponding values, then use the chr() function to get the character represented by that value, eg.

chr( hex( 'D0D6' ) );

Update: Fixed missing quotes, thanks davido


Comment on Re: How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?
Select or Download Code
Re^2: How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?
by davido (Archbishop) on Jun 06, 2011 at 00:58 UTC

    chr( hex( 0xD0D6 ) )

    ...should be written as chr( hex( '0xD0D6' ) ), shouldn't it?


    Dave

      Yes, thanks for that. I've fixed my post.

Re^2: How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?
by grantm (Parson) on Jun 06, 2011 at 01:49 UTC

    I'm not sure that's what the original poster needs. chr(0xD6D0) means Unicode code point U+D6D0 which is the character '훐'. Whereas the poster said the bytes represented by the ASCII string 'D6D0' are the character '中' in a 'GB' encoding. I'm not very knowledgeable about Asian encodings but I'll assume that the specific encoding is GB-2312.

    So the things we need to do are:

    1. convert the ASCII hex string into bytes
    2. decode the bytes from GB-2312 to Perl's internal character representation
    3. convert to a suitable output encoding

    Here's a complete script which does all of that:

    #!/usr/bin/perl use strict; use warnings; use Encode qw(decode); my $ascii_hex = 'D6D0'; # continue for as many bytes as required my $bytes = pack('H*', $ascii_hex); my $character_string = decode('gb2312', $bytes); binmode(STDOUT, ':utf8'); print $character_string, "\n";

      Yes, you're right. I must admit that I do not deal with Unicode very often, and therefore am not very knowledgeable on the subject.

      I am very grateful for your help. However, when I run your program, I do not get the character '中', but rather the non-sensical "Σ╕". Any idea of what is going wrong? Thanks. (I have the East Asian language pack installed, so it is not simply that.)

        What you see in the output depends on your terminal and few other things like locale and font. Perl can only take sequence of bytes and manipulate it. It is the work of some other program to draw a character on the screen that corresponds to that sequence of bytes. Assuming that D6 D0 is sequence for character 中 in GB-2312 encoding, the program produces sequence of bytes for the same character in UTF-8 encoding. If your terminal expects characters in GB-2312 encoding, then it won't display correct character. For example, my terminal expects text in UTF-8 encoding, so everything is displayed correctly.

        I can not know what your terminal expects, but if the following simple program produces correct output, then your terminal expects GB-2312 encoding for the text.

        perl -e 'print pack("H*", "D6D0"), "\n"'

        Sorry, my last post was unclear. I meant the program by grantm. Here is a clarified version:

        I am very grateful for your help. However, when I run the program below by grantm, I do not get the character '中', but rather the non-sensical "Σ╕". Any idea of what is going wrong? Thanks. (I have the East Asian language pack installed, so it is not simply that.)

        #!/usr/bin/perl use strict; use warnings; use Encode qw(decode); my $ascii_hex = 'D6D0'; # continue for as many bytes as required my $bytes = pack('H*', $ascii_hex); my $character_string = decode('gb2312', $bytes); binmode(STDOUT, ':utf8'); print $character_string, "\n";

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://908217]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2014-08-01 04:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (256 votes), past polls