Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Unicode Puzzle

by Skeeve (Vicar)
on Aug 12, 2010 at 21:16 UTC ( #854769=perlquestion: print w/replies, xml ) Need Help??
Skeeve has asked for the wisdom of the Perl Monks concerning the following question:

Hi! I'm a bit puzzled. I'm runnning a file through openssl to decode it. A portion of the output, in which I'm interested, is said to be "in Unicode, padded with 00".

So when I look at it in a hexdump, it reads like this:

00 4d 00 61 00 67 00 6e - 00 65 00 74 00 00 00 00 [.M.a.g.n.e.t....]

There are several strings of bytes which I read from openssl's output. I unpack them into an array. Demo code could be something like this:

$buf="\x00M\x00a\x00g\x00n\x00e\x00t\x00\x00\x00\x00" x 4; my(@unpacked)= unpack "a16" x 4, $buf;

So now I wonder how to convert the bytestrings in @unpacked to proper perl strings which I can print out without the \x00. Additionally I'd like to remove the padded zeroes.

I tried to use decode_utf8 and decode("utf16", ...) on them, but the first one did not seem to have any impact and the latter one fails with (example) "UTF-16:Unrecognised BOM 4d"

Does anyone of you have a hint what I'm doing wrong?


Replies are listed 'Best First'.
Re: Unicode Puzzle
by ikegami (Pope) on Aug 12, 2010 at 21:27 UTC

    It appears to be UTF-16be or UCS-2be (no way to know from what you posted).

    UTF-16 has two possible byte orders, so telling decode just "UTF-16" is not enough unless there's a BOM to indicate byte order.

    The "padded with 00" bit probably refers to the two U+0000 at the end.

      Many thanks, ikegami! Both (utf16be and ucs2be) seem to work. The data I have does not help me yet in deciding, which one is the "real" one.

        if you can get iconv library and associated binaries for you platform, they're good tools for unicode inspection and test/real conversion.
        the hardest line to type correctly is: stty erase ^H
Re: Unicode Puzzle
by moritz (Cardinal) on Aug 13, 2010 at 06:56 UTC
    ++ for including hexdump output in your question, and generally providing enough information to answer the question without huge guesswork.
    Perl 6 - links to (nearly) everything that is Perl 6.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://854769]
Approved by ikegami
[stonecolddevin]: hey MidLifeXis, all
[MidLifeXis]: o/
[GotToBTru]: greets

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (8)
As of 2017-01-20 19:22 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (176 votes). Check out past polls.