Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: How to encode and decode chinese string to iso-8859-1 encoding format

by 1nickt (Canon)
on Nov 11, 2017 at 01:14 UTC ( [id://1203155]=note: print w/replies, xml ) Need Help??


in reply to How to encode and decode chinese string to iso-8859-1 encoding format

Hi, there are a couple of things I see now looking on a full-size screen ...

First, in the second set of statements you encode $str to $secondaryOctets but then print the output of decoding $str.

The above does not fix the issue, though. When you tell Perl to use utf8; on the source code, it reads any high unicode characters in as characters, rather than as a sequence of bytes. This works well for your first case when you decode from and to UTF-8. But since ISO-8859-1 doesn't know about multi-byte characters, you get the wide character error. You should not tell Perl that the source code is in UTF-8 *if* you plan to read it in as bytes.

Similarly but separately, when you apply the ':utf8' IO layer to STDOUT, you are telling Perl that the output is going to be encoded in UTF-8. That's not the case when you've encoded to ISO-8859-1, so you shouldn't apply the layer.

The following script attempts to demonstrate what I mean:

use strict; use warnings; use feature 'say';
use Encode;
use Class::Unload;

{
    say 'With UTF-8';
    use utf8;
    my $str = '這是一個測試';
    my $perl = encode("utf8", $str);

    binmode STDOUT, ':utf8';
    say decode("utf8", $perl);
}

{
    say 'With ISO-8859-1';
    Class::Unload->unload('utf8');
    my $str = '這是一個測試';
    my $perl = encode("ISO-8859-1" , $str);

    binmode STDOUT;
    say decode("ISO-8859-1", $perl);
}

__END__

Outputs:

$ perl 1203139.pl

With UTF-8
這是一個測試
With ISO-8859-1
這是一個測試

Disclaimer: Working with encodings is very complicated, as you know, and I am not an expert in the field. As this example shows there can be multiple overlaying issues, and it's possible for a script to appear to be working right when it's just an accident. So while it is my best understanding, I don't guarantee that my explanation here is correct.

Hope this helps!


The way forward always starts with a minimal test.

Replies are listed 'Best First'.
Re^2: How to encode and decode chinese string to iso-8859-1 encoding format
by thanos1983 (Parson) on Nov 11, 2017 at 08:00 UTC

    Hello 1nickt

    That makes a lot of sense. Thanks for clarification. I am not also an expert on Binaries I am trying to learn a few things by experimentation.

    Thanks again for your time and effort to provide a small sample, it helped me to understand a lot

    BR / Thanos

    Seeking for Perl wisdom...on the process of learning...not there...yet!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1203155]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (7)
As of 2024-04-23 14:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found