Encoding horridness revisited: What's going on here? [SOLVED]

karlgoethebier has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

after reading Encoding horridness i played around a bit:

#!/usr/bin/env perl

#     $Id: weird.pl,v 1.3 2017/07/13 09:06:54 karl Exp karl $

use strict;
use warnings;
use feature qw(say);

my $file = q(weird.txt);

open( my $fh, '>', $file );
binmode $fh, ':encoding(UTF-8)';
say $fh qq(nase\ngöre);
close $fh;

say qx (file -I $file);
say qx(echo \$LANG);
say qx(cat $file);

open( $fh, '<', $file );
binmode $fh, ':encoding(UTF-8)';
say <$fh>;
close $fh;

__END__
[download]

This is leading to:

karls-mac-mini:monks karl$ ./weird.pl
    weird.txt: text/plain; charset=utf-8

de_DE.UTF-8

nase
gÃ¶re

nase
göre
[download]

And if i say use utf8; i get:

karls-mac-mini:monks karl$ ./weird.pl
    weird.txt: text/plain; charset=utf-8

de_DE.UTF-8

nase
göre

nase
g?re
[download]

What do i miss?

Thanks for any hint and best regards, Karl

Update: Two working solutions:

Update2: Sorry, wrong merits.

1nickt:

#!/usr/bin/env perl

#     $Id: weird_1nickt.pl,v 1.2 2017/07/13 17:10:29 karl Exp karl $  
+  

use strict;
use warnings;
use feature qw(say);
use utf8;

my $file = q(weird.txt);

open( my $fh, '>', $file );
binmode $fh, ':encoding(UTF-8)';
say $fh qq(nase\ngöre);
close $fh;

say qx (file -I $file);
say qx(echo \$LANG);
say qx(cat $file);

open( $fh, '<', $file );
binmode $fh, ':encoding(UTF-8)';
binmode STDOUT, ':encoding(UTF-8)';
say <$fh>;
close $fh;


__END__

karls-mac-mini:monks karl$ ./weird_1nickt.pl 
    weird.txt: text/plain; charset=utf-8

de_DE.UTF-8

nase
göre

nase
göre
[download]

choroba:

#!/usr/bin/env perl

#       $Id: weird_choroba.pl,v 1.2 2017/07/13 20:47:38 karl Exp karl 
+$  

use strict;
use warnings;
use feature qw(say);
use utf8;
use open IO => ':encoding(utf-8)', ':std';

my $file = q(weird.txt);

open( my $fh, '>', $file );
say $fh qq(nase\ngöre);
close $fh;

say qx (file -I $file);
say qx(echo \$LANG);
say qx(cat $file);

open( $fh, '<', $file );
say <$fh>;
close $fh;

__END__

karls-mac-mini:monks karl$ ./weird_choroba.pl 
    weird.txt: text/plain; charset=utf-8

de_DE.UTF-8

nase
göre

nase
göre
[download]

«The Crux of the Biscuit is the Apostrophe»

perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

Comment on Encoding horridness revisited: What's going on here? [SOLVED] Select or Download Code

Replies are listed 'Best First'.
Re: Encoding horridness revisited: What's going on here? by 1nickt (Canon) on Jul 13, 2017 at 13:14 UTC
Hi Karl, What do i miss? As I understand it, you missed telling Perl to encode your STDOUT output as UTF-8, after you read it in the second time (from the file). You don't need to (should not) do so when printing the output of `cat`, since your terminal already handles the encoding correctly. `use strict; use warnings; use feature qw(say); use utf8; # <-- needed, since you have high characters in your source my $file = q(weird.txt); open( my $fh, '>', $file ); binmode $fh, ':encoding(UTF-8)'; say $fh qq(nase\ngöre); close $fh; say qx (file -I $file); say qx(echo \$LANG); say qx(cat $file); open( $fh, '<', $file ); binmode $fh, ':encoding(UTF-8)'; binmode STDOUT, ':encoding(UTF-8)'; # <-- here say <$fh>; close $fh; __END__` [download] The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re^2: Encoding horridness revisited: What's going on here? by choroba (Cardinal) on Jul 13, 2017 at 14:08 UTC
Or just add `use utf8; use open IO => ':encoding(UTF-8)', ':std';` [download] and remove all binmode or `:encoding` in open calls. ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l] [select]
Re^3: Encoding horridness revisited: What's going on here? by karlgoethebier (Abbot) on Jul 13, 2017 at 16:00 UTC
This works! I'll try the solution from 1nickt and update the OP. Very nice! Thank you very much, Karl «The Crux of the Biscuit is the Apostrophe» `perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'`Help	[reply] [d/l]
Re^2: Encoding horridness revisited: What's going on here? by karlgoethebier (Abbot) on Jul 13, 2017 at 17:28 UTC
This works as well! Please see my update above. Thank you very much, Karl «The Crux of the Biscuit is the Apostrophe» `perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'`Help	[reply] [d/l]
Re: Encoding horridness revisited: What's going on here? by choroba (Cardinal) on Jul 13, 2017 at 09:47 UTC
In what encoding did you save the source file? What encoding does your terminal use? ($q=q:Sq=~/;[c](.)(.)/;chr(-\|\|-\|5+lengthSq)`"S\|oS2"`map{chr \|+ord }map{substrSq`S_+\|`\|}3E\|-\|`7**2-3:)=~y+S\|`+$1,++print+eval$q,q,a, [download]	[reply] [d/l]
Re^2: Encoding horridness revisited: What's going on here? by karlgoethebier (Abbot) on Jul 13, 2017 at 13:59 UTC
emacs: `M-x describe-current-coding-system Coding system for saving this buffer: u -- mule-utf-8-unix Default coding system (for new files): u -- mule-utf-8 (alias: utf-8) Coding system for keyboard input: u -- utf-8 (alias of mule-utf-8) Coding system for terminal output: u -- utf-8 (alias of mule-utf-8) Defaults for subprocess I/O: decoding: u -- mule-utf-8 (alias: utf-8) encoding: u -- mule-utf-8 (alias: utf-8)` [download] bash: `karls-mac-mini:monks karl$ locale LANG="de_DE.UTF-8" LC_COLLATE="de_DE.UTF-8" LC_CTYPE="de_DE.UTF-8" LC_MESSAGES="de_DE.UTF-8" LC_MONETARY="de_DE.UTF-8" LC_NUMERIC="de_DE.UTF-8" LC_TIME="de_DE.UTF-8" LC_ALL=` [download] «The Crux of the Biscuit is the Apostrophe» `perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'`Help	[reply] [d/l] [select]
Re^3: Encoding horridness revisited: What's going on here? by Corion (Patriarch) on Jul 13, 2017 at 14:06 UTC
Can you trust your terminal emulator to properly handle the output? To me, encoding issues are always a wild goose chase, so I like to eliminate as many things from the encoding dance as quickly as possible. Usually that means that instead of including umlauts (or whatever) in my source code, I use the character names instead: `# instead of use utf8; my $s = "göre";` [download] `# I prefer to use use charnames; my $s = "g\N{LATIN SMALL LETTER O WITH DIAERESIS}re";` [download] This eliminates the issue that my text editor is lying to me. When inspecting the output, I either pipe the output through `hexdump` or through `Data::Dumper` with `$Data::Dumper::Useqq =1;` so the console only sees 7bit ASCII. This eliminates my terminal emulator lying to me. Of course, that does not help with reading data from files that I don't control, but every little step helps.	[reply] [d/l] [select]
Re^4: Encoding horridness revisited: What's going on here? by karlgoethebier (Abbot) on Jul 13, 2017 at 14:23 UTC
Re: Encoding horridness revisited: What's going on here? by gandolf989 (Scribe) on Jul 13, 2017 at 15:12 UTC
I had a similar issue. I was trying to send a plain text email from Linux. I tried various Perl modules, mail, sendmail, etc. I then realized that Outlook was formatting my email. If you use Outlook 2016 open one of your test emails, go under file, properties and look at the header information. You should see something like this: Content-Type: text/html; charset="ISO-8859-1". If you find that the encoding is correct, then you need to check to see what Outlook is doing with your email. On your Outlook screen under file, options, mail you will see a button for stationary and fonts. The select the font button for composind and reading plain text messages and make sure that it is set to courier new and whatever font size you like. Then try opening one of you plain text emails and see if your white space formatting is OK.	[reply]

Back to Seekers of Perl Wisdom