You don't need to close the old STDOUT and create a new one with the CP437 encoding; you can just binmode the existing STDOUT:
C:\Users\peter.jones\Downloads\TempData\perl>perl -Ilib -e"print qq(\xe4)"
Σ
C:\Users\peter.jones\Downloads\TempData\perl>perl -Ilib -MDOS::Try -e"print qq(\xe4)"
ä
vs
C:\Users\peter.jones\Downloads\TempData\perl>perl -e"print qq(\xe4)"
Σ
C:\Users\peter.jones\Downloads\TempData\perl>perl -e"binmode STDOUT, ':encoding(Cp437)'; print qq(\xe4)"
ä
And the reason your example doesn't work in your test is the same reason that you wrote the module: you need to have the right encoding on the output of your test script as well as the code of the `...`.
#!perl
use strict;
use warnings;
my $result = `perl -Ilib -MDOS::Try -e"print qq(\xe4)"`;
print "first test: $result\n";
use lib 'lib';
require DOS::Try;
print "second test: $result\n";
__END__
first test: Σ
second test: ä
You can see more if you hex dump the bytes being output from the two variants of the oneliner:
C:\Users\peter.jones\Downloads\TempData\perl>perl -e"print qq(\xe4)" | perl -e "print unpack 'H*',$_ for <>"
e4
C:\Users\peter.jones\Downloads\TempData\perl>perl -e"binmode STDOUT, ':encoding(Cp437)'; print qq(\xe4)" | perl -e "print unpack 'H*',$_ for <>"
84
| [reply] [d/l] |
> It incorrectly displalys a greek sigma.
not for me on Win10, it's an "õ"
and this dependends on the codepage and font configured for the CMD.
my properties tell me that I have CP 850 (OEM Multiligual Latin 1) preset, and if I change font to raster font (or whatever the English translation is) I see a capital sigma.
> It now displays the correct character
which is???
| [reply] [d/l] [select] |
not for me on Win10, it's an "õ"
Same for me on my Windows 7 machine - cp850 for the cmd.exe console; cp1252 for text files.
I still have an old Text::Iconv script (that I hadn't used for years) that converts between the 2.
Cheers, Rob
| [reply] |
Thanks to all for the help. Anonymous Monk's links suggest that I have an X-Y problem where the X has already been solved. This seems like the ideal solution, but so far I have not been able to make it work. I have a lot to learn about terminals etc.
I had rejected the binmode idea because I expected it to remove the :crlf layer. I forgot that I should test all possible solutions for that.
The observation that my test has the very flaw that I was trying to fix is interesting. Despite the error, it still demonstrates my problem. The fact that Result1 is a sigma proves that the shell script did not encode the \xe4. In fact Result2 masks the problem by encoding the character in the main program.
Sorry for the confusion to other windows users. My solution is only intended for 'hobby' use on my own system. I do not expect to use it for anything except an occasional download from perlmonks.
I probably will end up with the binmode solution. I am still trying to devise a suitable test. Most of my difficulty is unrelated to the problem (e.g. quotes and escapes).
| [reply] |
Bill,
The fact that Result1 is a sigma proves that the shell script did not encode the \xe4
I disagree, because the `` has some implicit translations going on that you aren't controlling.
To test it, you need to capture the raw output bytes of the -e under test. Then you can compare them to the expected values.
I show an example test where I print \xe4\xe0 twice: the first time, without a binmode in the oneliner; the second time, with a binmode in the oneliner. You can see that the bytes that are output are different. You can test that those bytes match your expectations.
C:\Users\peter.jones\Downloads\TempData\perl>chcp
Active code page: 437
C:\Users\peter.jones\Downloads\TempData\perl>perl pm.pl
__SOURCE__
#!perl
use 5.012; # strict, //
use warnings;
use IPC::Open2;
use Test::More;
undef $\;
print "\n__SOURCE__\n";
seek \*DATA, 0, 0;
print for <DATA>;
$\ = "\n";
{
my $pid = open2(my $ofh, my $ifh, 'perl', '-e', q("print qq(\xe4\xe0)"));
binmode $ofh, ':raw'; # need to read from the open2 output file handle in raw mode, so you're looking at bytes, _not_ characters
chomp(my $line = <$ofh>);
print "without binmode, the high 8-bit characters pass through untranslated: ", unpack 'H*', $line;
is $line, "\xE4\xE0", 'the bytes should be unedited';
print "and printed out during test script: '$line'";
}
{
my $pid = open2(my $ofh, my $ifh, 'perl', '-e', q("binmode STDOUT, ':encoding(Cp437)'; print qq(\xe4\xe0)"));
binmode $ofh, ':raw'; # need to read from the open2 output file handle in raw mode, so you're looking at bytes, _not_ characters
chomp(my $line = <$ofh>);
print "with binmode, xE4 gets translated to x84 for a-umlaut, and xE0 gets translated to x85 for a-grave: ", unpack 'H*', $line;
is $line, "\x84\x85", 'the bytes should be CP437-encoded';
# so here, instead of printing the hexdump of the captured line, you could compare
print "and printed out during test script: '$line'";
}
done_testing();
__END__
__OUTPUT__
without binmode, the high 8-bit characters pass through untranslated: e4e0
ok 1 - the bytes should be unedited
and printed out during test script: 'Σα'
with binmode, xE4 gets translated to x84 for a-umlaut, and xE0 gets translated to x85 for a-grave: 8485
ok 2 - the bytes should be CP437-encoded
and printed out during test script: 'äà'
1..2
| [reply] [d/l] |
Thank you pryrt. Your second example is exactly what I asked for in my first post. I have already extended it to test newlines by appending \n to the input string and \r\n to the expected output string (and removing chomp). My project started as a minor annoyance and ended with a one-line solution. I never dreamed that in between, I would need to learn details of windows (including a reference to an old DOS manual to get started), Unicode, and even perl (I have never used a child process).
| [reply] |