http://www.perlmonks.org?node_id=36788

footpad has asked for the wisdom of the Perl Monks concerning the following question: (strings)

How do I convert a string to Unicode and back (v5.005_03)?

Originally posted as a Categorized Question.

  • Comment on How do I convert a string to Unicode and back (v5.005_03)?

Replies are listed 'Best First'.
Re: How do I convert a string to Unicode and back (v5.005_03)?
by tye (Sage) on Oct 15, 2000 at 09:59 UTC
    $utf16= pack "S*", unpack( "C*", $ascii ), 0; chop( $ascii= pack "C*", unpack "S*", $utf16 );

    Where "UTF16" refers to a 16-bit flavor of Unicode that I think is what Microsoft calls UNICODE (and is not very much like what most non-Microsoft references refer to as Unicode which I think is more precisely called UTF8).

    Note how the UTF16 string explicitly includes the trailing L'\0' while the ASCII string does not explicitly include the trailing '\0'. This is because Perl strings implicitly include a trailing '\0' tacked on the end but this is not enough to terminate a 16-bit-char string.

         - tye
Re: How do I convert a string to Unicode and back (v5.005_03)?
by negative64 (Initiate) on Aug 07, 2001 at 19:45 UTC
    I should warn you that while this code may work for ascii, it most definitely will not work for other character sets and/or encodings.

    You can use the Unicode::Map module like this:

    perl -MUnicode::Map -e'print Unicode::Map->new(shift)->to8(<>)'
    There is also to16() if you need utf16.

    Note that not all mappings are round-trip, i.e. you won't necessarily get back what you put in if you try to "undo" the conversion.

    For lots and lots of information on dealing with complex character set and encoding issues, see Ken Lunde's excellent book CJKV Information Processing from O'Reilly.

    Or to skip the reading, you can just go to the examples and look at the perl directory.

Re: How do I convert a string to Unicode and back (v5.005_03)?
by mirod (Canon) on Aug 07, 2001 at 21:05 UTC

    If you have the iconv library on your system then the easiest way is to use Text::Iconv:

    use Text::Iconv; my $enc= 'latin1'; # or any other encoding supported by iconv, iconv - +-list gives the list my $enc2utf = new Text::Iconv( $enc, 'utf8') or die "Can't create enc2utf converter"; my $utf2enc= new Text::Iconv( 'utf8', $enc) or die "Can't create utf2enc converter"; # then you can convert strings like this: my $utf8_string= $enc2utf->convert( $enc_string); my $enc_string= $utf2enc->convert( $utf8_string);

    This works with every character encoding supported by iconv, be it 1-byte or multi-byte.

    You can also have a look at Converting character encodings for a discussion on character conversion.

Re: How do I convert a string to Unicode and back (v5.005_03)?
by Anonymous Monk on Oct 27, 2004 at 22:24 UTC
    <maha45 m@yahoo.com> Re: ׳׳׳׳׳׳ ׳׳׳׳ ׳׳׳ ׳ ׳׳ ׳ ׳ ׳׳׳׳ ׳ ׳׳ ׳׳׳׳׳.. ׳׳׳ ׳ ׳ -------------------------------------------------------------------------------- Do you Yahoo!? Yahoo! Mail Address AutoComplete - You start. We finish. -------------------------------------------------------------------------------- Walla! Mail - get your free mail today <maha45 m@yahoo.com> Re: ׳׳׳׳׳׳§ ׳©׳׳׳ ׳”׳™׳™ ׳ ׳¦׳¨ ׳ ׳ ׳”׳×׳—׳ ׳ ׳•׳ ׳×׳›׳×׳•׳‘.. ׳•׳›׳ ׳ ׳ -------------------------------------------------------------------------------- Do you Yahoo!? Yahoo! Mail Address AutoComplete - You start. We finish. -------------------------------------------------------------------------------- Walla! Mail - get your free mail today

    Originally posted as a Categorized Answer.