Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Perl Monk, Perl Meditation
 
PerlMonks  

How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?

by dstrom (Initiate)
on Jun 05, 2011 at 20:11 UTC ( #908210=perlquestion: print w/ replies, xml ) Need Help??
dstrom has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I want to convert ascii text that represents Chinese characters to these characters in for example utf. Chinese characters in my file are encoded in two hexes in GB, e.g., 中 is (D6 D0). I just have the ascii text of the two hexes "D6 D0", etc., how can this be converted to Chinese characters? I appreciate your help, thanks in advance...

Comment on How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?
Re: How do I convert a sequence of hexes (D0 D6) to Chinese characters (中)?
by kejohm (Hermit) on Jun 05, 2011 at 22:15 UTC

    You could use the hex() function to convert the hex strings into their corresponding values, then use the chr() function to get the character represented by that value, eg.

    chr( hex( 'D0D6' ) );

    Update: Fixed missing quotes, thanks davido

      chr( hex( 0xD0D6 ) )

      ...should be written as chr( hex( '0xD0D6' ) ), shouldn't it?


      Dave

        Yes, thanks for that. I've fixed my post.

      I'm not sure that's what the original poster needs. chr(0xD6D0) means Unicode code point U+D6D0 which is the character '훐'. Whereas the poster said the bytes represented by the ASCII string 'D6D0' are the character '中' in a 'GB' encoding. I'm not very knowledgeable about Asian encodings but I'll assume that the specific encoding is GB-2312.

      So the things we need to do are:

      1. convert the ASCII hex string into bytes
      2. decode the bytes from GB-2312 to Perl's internal character representation
      3. convert to a suitable output encoding

      Here's a complete script which does all of that:

      #!/usr/bin/perl use strict; use warnings; use Encode qw(decode); my $ascii_hex = 'D6D0'; # continue for as many bytes as required my $bytes = pack('H*', $ascii_hex); my $character_string = decode('gb2312', $bytes); binmode(STDOUT, ':utf8'); print $character_string, "\n";

        Yes, you're right. I must admit that I do not deal with Unicode very often, and therefore am not very knowledgeable on the subject.

        I am very grateful for your help. However, when I run your program, I do not get the character '中', but rather the non-sensical "Σ╕". Any idea of what is going wrong? Thanks. (I have the East Asian language pack installed, so it is not simply that.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://908210]
Approved by rev_1318
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (13)
As of 2014-04-18 21:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (472 votes), past polls