Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

How to encode and decode chinese string to iso-8859-1 encoding format

by thanos1983 (Vicar)
on Nov 10, 2017 at 21:09 UTC ( #1203139=perlquestion: print w/replies, xml ) Need Help??
thanos1983 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Fellow Monks,

I was trying to figure out why I can not encode and decode Chinese characters in iso-8859-1 encoding format but I can convert it just fine without any problems on utf8 format. I was reading the Encode perldoc documentation but I was not able to figure it how to do it.

Sample of code that I am experimenting:

#!/usr/bin/perl
use utf8;
use Encode;
use strict;
use warnings;
use feature 'say';

binmode( STDOUT, ':utf8' );

my $str = '這是一個測試';

my $octets = encode("utf8", $str);
say decode("utf8", $octets);

my $secondaryOctets = encode("ISO-8859-1" , $str);
say decode("ISO-8859-1", $str);

__END__

$ perl stringEncodingDecoding.pl
這是一個測試
Wide character at /usr/local/lib/x86_64-linux-gnu/perl/5.24.1/Encode.pm line 228.

I also tried first to encode the string to utf8 and then convert it to ISO-8859-1, so I can decoded but I was not successful. Sample of code below:

my $encoded = Encode::from_to($octets, "utf8", "iso-8859-1"); say decode("iso-8859-1", $encoded); __END__ 6

Thanks in advance for the time and effort trying to assist me.

Seeking for Perl wisdom...on the process of learning...not there...yet!

Replies are listed 'Best First'.
Re: How to encode and decode chinese string to iso-8859-1 encoding format
by vr (Friar) on Nov 11, 2017 at 02:03 UTC
    why I can not encode and decode Chinese characters in iso-8859-1 encoding format but I can convert it just fine without any problems on utf8 format
    ...ahem, but because utf8 covers 1,112,064 codepoints, while iso-8859-1 only 255 and can barely handle European repertoire? You could make Perl let you know you are trying a dubious thing:
    use utf8;
    use Encode;
    encode( 'ISO-8859-1' , '這是一個測試', Encode::FB_CROAK );
    

    ...

    "\x{9019}" does not map to iso-8859-1 at ...
    

      Hello vr,

      Nice idea, to use the croak. It will be very usefull for modifications.

      Thanks again for your time and effort reading and replying to my question.

      ~

      BR / Thanos

      Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: How to encode and decode chinese string to iso-8859-1 encoding format
by 1nickt (Monsignor) on Nov 11, 2017 at 01:14 UTC

    Hi, there are a couple of things I see now looking on a full-size screen ...

    First, in the second set of statements you encode $str to $secondaryOctets but then print the output of decoding $str.

    The above does not fix the issue, though. When you tell Perl to use utf8; on the source code, it reads any high unicode characters in as characters, rather than as a sequence of bytes. This works well for your first case when you decode from and to UTF-8. But since ISO-8859-1 doesn't know about multi-byte characters, you get the wide character error. You should not tell Perl that the source code is in UTF-8 *if* you plan to read it in as bytes.

    Similarly but separately, when you apply the ':utf8' IO layer to STDOUT, you are telling Perl that the output is going to be encoded in UTF-8. That's not the case when you've encoded to ISO-8859-1, so you shouldn't apply the layer.

    The following script attempts to demonstrate what I mean:

    use strict; use warnings; use feature 'say';
    use Encode;
    use Class::Unload;
    
    {
        say 'With UTF-8';
        use utf8;
        my $str = '這是一個測試';
        my $perl = encode("utf8", $str);
    
        binmode STDOUT, ':utf8';
        say decode("utf8", $perl);
    }
    
    {
        say 'With ISO-8859-1';
        Class::Unload->unload('utf8');
        my $str = '這是一個測試';
        my $perl = encode("ISO-8859-1" , $str);
    
        binmode STDOUT;
        say decode("ISO-8859-1", $perl);
    }
    
    __END__
    

    Outputs:

    $ perl 1203139.pl
    
    With UTF-8
    這是一個測試
    With ISO-8859-1
    這是一個測試
    

    Disclaimer: Working with encodings is very complicated, as you know, and I am not an expert in the field. As this example shows there can be multiple overlaying issues, and it's possible for a script to appear to be working right when it's just an accident. So while it is my best understanding, I don't guarantee that my explanation here is correct.

    Hope this helps!


    The way forward always starts with a minimal test.

      Hello 1nickt

      That makes a lot of sense. Thanks for clarification. I am not also an expert on Binaries I am trying to learn a few things by experimentation.

      Thanks again for your time and effort to provide a small sample, it helped me to understand a lot

      BR / Thanos

      Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: How to encode and decode chinese string to iso-8859-1 encoding format
by 1nickt (Monsignor) on Nov 10, 2017 at 21:25 UTC

    You put utf8 binmode on stdout but the 2nd output is not in that encoding. Maybe? (Sorry for brevity, on phone)

    The way forward always starts with a minimal test.

      Hello 1nickt,

      Thank you for your time and effort reading and replying to my question. It is not clear to me exactly what you mean. Can you describe a bit more.

      Again thanks for your time and effort. BR / Thanos

      Seeking for Perl wisdom...on the process of learning...not there...yet!
Re: How to encode and decode chinese string to iso-8859-1 encoding format
by Anonymous Monk on Nov 11, 2017 at 10:34 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1203139]
Approved by 1nickt
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2018-04-21 18:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?