Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Unicode Puzzle

by Skeeve (Vicar)
on Aug 12, 2010 at 21:16 UTC ( #854769=perlquestion: print w/ replies, xml ) Need Help??
Skeeve has asked for the wisdom of the Perl Monks concerning the following question:

Hi! I'm a bit puzzled. I'm runnning a file through openssl to decode it. A portion of the output, in which I'm interested, is said to be "in Unicode, padded with 00".

So when I look at it in a hexdump, it reads like this:

00 4d 00 61 00 67 00 6e - 00 65 00 74 00 00 00 00 [.M.a.g.n.e.t....]

There are several strings of bytes which I read from openssl's output. I unpack them into an array. Demo code could be something like this:

$buf="\x00M\x00a\x00g\x00n\x00e\x00t\x00\x00\x00\x00" x 4; my(@unpacked)= unpack "a16" x 4, $buf;

So now I wonder how to convert the bytestrings in @unpacked to proper perl strings which I can print out without the \x00. Additionally I'd like to remove the padded zeroes.

I tried to use decode_utf8 and decode("utf16", ...) on them, but the first one did not seem to have any impact and the latter one fails with (example) "UTF-16:Unrecognised BOM 4d"

Does anyone of you have a hint what I'm doing wrong?


s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Comment on Unicode Puzzle
Select or Download Code
Re: Unicode Puzzle
by ikegami (Pope) on Aug 12, 2010 at 21:27 UTC

    It appears to be UTF-16be or UCS-2be (no way to know from what you posted).

    UTF-16 has two possible byte orders, so telling decode just "UTF-16" is not enough unless there's a BOM to indicate byte order.

    The "padded with 00" bit probably refers to the two U+0000 at the end.

      Many thanks, ikegami! Both (utf16be and ucs2be) seem to work. The data I have does not help me yet in deciding, which one is the "real" one.


      s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
      +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
        if you can get iconv library and associated binaries for you platform, they're good tools for unicode inspection and test/real conversion.
        the hardest line to type correctly is: stty erase ^H
Re: Unicode Puzzle
by moritz (Cardinal) on Aug 13, 2010 at 06:56 UTC
    ++ for including hexdump output in your question, and generally providing enough information to answer the question without huge guesswork.
    Perl 6 - links to (nearly) everything that is Perl 6.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://854769]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (10)
As of 2014-12-29 15:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (192 votes), past polls