Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?

by dstrom (Initiate)
on Jun 05, 2011 at 20:11 UTC ( #908210=perlquestion: print w/replies, xml ) Need Help??
dstrom has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I want to convert ascii text that represents Chinese characters to these characters in for example utf. Chinese characters in my file are encoded in two hexes in GB, e.g., 中 is (D6 D0). I just have the ascii text of the two hexes "D6 D0", etc., how can this be converted to Chinese characters? I appreciate your help, thanks in advance...
  • Comment on How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?

Replies are listed 'Best First'.
Re: How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?
by kejohm (Hermit) on Jun 05, 2011 at 22:15 UTC

    You could use the hex() function to convert the hex strings into their corresponding values, then use the chr() function to get the character represented by that value, eg.

    chr( hex( 'D0D6' ) );

    Update: Fixed missing quotes, thanks davido

      I'm not sure that's what the original poster needs. chr(0xD6D0) means Unicode code point U+D6D0 which is the character '훐'. Whereas the poster said the bytes represented by the ASCII string 'D6D0' are the character '中' in a 'GB' encoding. I'm not very knowledgeable about Asian encodings but I'll assume that the specific encoding is GB-2312.

      So the things we need to do are:

      1. convert the ASCII hex string into bytes
      2. decode the bytes from GB-2312 to Perl's internal character representation
      3. convert to a suitable output encoding

      Here's a complete script which does all of that:

      #!/usr/bin/perl use strict; use warnings; use Encode qw(decode); my $ascii_hex = 'D6D0'; # continue for as many bytes as required my $bytes = pack('H*', $ascii_hex); my $character_string = decode('gb2312', $bytes); binmode(STDOUT, ':utf8'); print $character_string, "\n";

        Yes, you're right. I must admit that I do not deal with Unicode very often, and therefore am not very knowledgeable on the subject.

        I am very grateful for your help. However, when I run your program, I do not get the character '中', but rather the non-sensical "Σ╕". Any idea of what is going wrong? Thanks. (I have the East Asian language pack installed, so it is not simply that.)

      chr( hex( 0xD0D6 ) )

      ...should be written as chr( hex( '0xD0D6' ) ), shouldn't it?


        Yes, thanks for that. I've fixed my post.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://908210]
Approved by rev_1318
Front-paged by toolic
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2017-04-30 08:08 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (534 votes). Check out past polls.