Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?

by dstrom (Initiate)
on Jun 05, 2011 at 20:11 UTC ( #908210=perlquestion: print w/ replies, xml ) Need Help??
dstrom has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I want to convert ascii text that represents Chinese characters to these characters in for example utf. Chinese characters in my file are encoded in two hexes in GB, e.g., 中 is (D6 D0). I just have the ascii text of the two hexes "D6 D0", etc., how can this be converted to Chinese characters? I appreciate your help, thanks in advance...

Comment on How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?
Re: How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?
by kejohm (Hermit) on Jun 05, 2011 at 22:15 UTC

    You could use the hex() function to convert the hex strings into their corresponding values, then use the chr() function to get the character represented by that value, eg.

    chr( hex( 'D0D6' ) );

    Update: Fixed missing quotes, thanks davido

      chr( hex( 0xD0D6 ) )

      ...should be written as chr( hex( '0xD0D6' ) ), shouldn't it?


        Yes, thanks for that. I've fixed my post.

      I'm not sure that's what the original poster needs. chr(0xD6D0) means Unicode code point U+D6D0 which is the character '훐'. Whereas the poster said the bytes represented by the ASCII string 'D6D0' are the character '中' in a 'GB' encoding. I'm not very knowledgeable about Asian encodings but I'll assume that the specific encoding is GB-2312.

      So the things we need to do are:

      1. convert the ASCII hex string into bytes
      2. decode the bytes from GB-2312 to Perl's internal character representation
      3. convert to a suitable output encoding

      Here's a complete script which does all of that:

      #!/usr/bin/perl use strict; use warnings; use Encode qw(decode); my $ascii_hex = 'D6D0'; # continue for as many bytes as required my $bytes = pack('H*', $ascii_hex); my $character_string = decode('gb2312', $bytes); binmode(STDOUT, ':utf8'); print $character_string, "\n";

        Yes, you're right. I must admit that I do not deal with Unicode very often, and therefore am not very knowledgeable on the subject.

        I am very grateful for your help. However, when I run your program, I do not get the character '中', but rather the non-sensical "Σ╕". Any idea of what is going wrong? Thanks. (I have the East Asian language pack installed, so it is not simply that.)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://908210]
Approved by rev_1318
Front-paged by toolic
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (21)
As of 2014-08-20 15:13 GMT
Find Nodes?
    Voting Booth?

    The best computer themed movie is:

    Results (116 votes), past polls