Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Unicode Puzzle

by Skeeve (Vicar)
on Aug 12, 2010 at 21:16 UTC ( #854769=perlquestion: print w/ replies, xml ) Need Help??
Skeeve has asked for the wisdom of the Perl Monks concerning the following question:

Hi! I'm a bit puzzled. I'm runnning a file through openssl to decode it. A portion of the output, in which I'm interested, is said to be "in Unicode, padded with 00".

So when I look at it in a hexdump, it reads like this:

00 4d 00 61 00 67 00 6e - 00 65 00 74 00 00 00 00 [.M.a.g.n.e.t....]

There are several strings of bytes which I read from openssl's output. I unpack them into an array. Demo code could be something like this:

$buf="\x00M\x00a\x00g\x00n\x00e\x00t\x00\x00\x00\x00" x 4; my(@unpacked)= unpack "a16" x 4, $buf;

So now I wonder how to convert the bytestrings in @unpacked to proper perl strings which I can print out without the \x00. Additionally I'd like to remove the padded zeroes.

I tried to use decode_utf8 and decode("utf16", ...) on them, but the first one did not seem to have any impact and the latter one fails with (example) "UTF-16:Unrecognised BOM 4d"

Does anyone of you have a hint what I'm doing wrong?


s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
+.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Comment on Unicode Puzzle
Select or Download Code
Re: Unicode Puzzle
by ikegami (Pope) on Aug 12, 2010 at 21:27 UTC

    It appears to be UTF-16be or UCS-2be (no way to know from what you posted).

    UTF-16 has two possible byte orders, so telling decode just "UTF-16" is not enough unless there's a BOM to indicate byte order.

    The "padded with 00" bit probably refers to the two U+0000 at the end.

      Many thanks, ikegami! Both (utf16be and ucs2be) seem to work. The data I have does not help me yet in deciding, which one is the "real" one.


      s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
      +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
        if you can get iconv library and associated binaries for you platform, they're good tools for unicode inspection and test/real conversion.
        the hardest line to type correctly is: stty erase ^H
Re: Unicode Puzzle
by moritz (Cardinal) on Aug 13, 2010 at 06:56 UTC
    ++ for including hexdump output in your question, and generally providing enough information to answer the question without huge guesswork.
    Perl 6 - links to (nearly) everything that is Perl 6.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://854769]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2014-08-21 02:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (127 votes), past polls