Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

How do I convert a string to Unicode and back (v5.005_03)?

( #36788=categorized question: print w/ replies, xml ) Need Help??
Contributed by footpad on Oct 15, 2000 at 07:49 UTC
Q&A  > strings


Answer: How do I convert a string to Unicode and back (v5.005_03)?
contributed by tye

$utf16= pack "S*", unpack( "C*", $ascii ), 0; chop( $ascii= pack "C*", unpack "S*", $utf16 );

Where "UTF16" refers to a 16-bit flavor of Unicode that I think is what Microsoft calls UNICODE (and is not very much like what most non-Microsoft references refer to as Unicode which I think is more precisely called UTF8).

Note how the UTF16 string explicitly includes the trailing L'\0' while the ASCII string does not explicitly include the trailing '\0'. This is because Perl strings implicitly include a trailing '\0' tacked on the end but this is not enough to terminate a 16-bit-char string.

     - tye
Answer: How do I convert a string to Unicode and back (v5.005_03)?
contributed by Zombie negative64

I should warn you that while this code may work for ascii, it most definitely will not work for other character sets and/or encodings.

You can use the Unicode::Map module like this:

   perl -MUnicode::Map -e'print Unicode::Map->new(shift)->to8(<>)' 
There is also to16() if you need utf16.

Note that not all mappings are round-trip, i.e. you won't necessarily get back what you put in if you try to "undo" the conversion.

For lots and lots of information on dealing with complex character set and encoding issues, see Ken Lunde's excellent book CJKV Information Processing from O'Reilly.

Or to skip the reading, you can just go to the examples and look at the perl directory.

Answer: How do I convert a string to Unicode and back (v5.005_03)?
contributed by mirod

If you have the iconv library on your system then the easiest way is to use Text::Iconv:

use Text::Iconv; my $enc= 'latin1'; # or any other encoding supported by iconv, iconv - +-list gives the list my $enc2utf = new Text::Iconv( $enc, 'utf8') or die "Can't create enc2utf converter"; my $utf2enc= new Text::Iconv( 'utf8', $enc) or die "Can't create utf2enc converter"; # then you can convert strings like this: my $utf8_string= $enc2utf->convert( $enc_string); my $enc_string= $utf2enc->convert( $utf8_string);

This works with every character encoding supported by iconv, be it 1-byte or multi-byte.

You can also have a look at Converting character encodings for a discussion on character conversion.

Please (register and) log in if you wish to add an answer



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others avoiding work at the Monastery: (12)
    As of 2014-08-01 16:12 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Who would be the most fun to work for?















      Results (28 votes), past polls