Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Re^3: Perl Windows vs Cygwin installs

by Eliya (Vicar)
on Mar 23, 2012 at 23:56 UTC ( #961335=note: print w/replies, xml ) Need Help??

in reply to Re^2: Perl Windows vs Cygwin installs
in thread Perl Windows vs Cygwin installs

It still matters with newer perls, too.

It's kind of a pity the patch you linked to doesn't really fix the issue it (apparently) set out to fix, i.e. the long standing bug with encodings like UTF-16 in combination with the :crlf layer.

I just checked it with 5.15.8, and I still see the same "unexpected" behavior, as it always has been. That is, when na´vely pushing a UTF-16 layer to enable UTF-16 functionality (on Windows), corrupted files are produced on writing, and carriage returns are not being removed upon reading:

--- writing ---

#!/usr/local/perl/5.15.8/bin/perl -w my $fname = "foo.utf16"; open my $out, ">:crlf:encoding(UTF-16LE)", $fname or die; print $out "\x{feff}\x{1234}\n\x{5678}\n";
$ ./ $ hexdump foo.utf16 0000000 feff 1234 0a0d 7800 0d56 000a 000000c

Wrong!  correct encoding should be:

$ hexdump foo.utf16 0000000 feff 1234 000d 000a 5678 000d 000a 000000e

--- reading ---

#!/usr/local/perl/5.15.8/bin/perl -w use Devel::Peek; my $fname = "foo.utf16"; # create correct file, using the same old layer mantra # (the extra :utf8 is only required with older perls) open my $out, ">:raw:encoding(UTF-16LE):crlf:utf8", $fname or die; print $out "\x{feff}\x{1234}\n\x{5678}\n"; close $out; # read file back in open my $in, "<:crlf:encoding(UTF-16LE)", $fname or die; $/ = undef; Dump <$in>;
$ ./ SV = PV(0x77dc60) at 0x953728 REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x829130 "\357\273\277\341\210\264\r\n\345\231\270\r\n"\0 [UTF8 + "\x{feff}\x{1234}\r\n\x{5678}\r\n"] CUR = 13 ^ ^ LEN = 14

Wrong!  \r should've been removed.

(Note that because I tested this on Unix, I had to push :crlf myself. With a native Windows perl, the layer would of course already have been in place — i.e., you'd just say ">:encoding(UTF-16LE)" or "<:encoding(UTF-16LE)" (as anyone unaware of the issue would likely have tried).)

Personally, I think allowing another :crlf to be pushed on the stack (as it is now after the patch) is not the right approach to fix the issue, because you still have to manually rearrange the layers to get correct results.  I fail to see the benefit of being allowed to have two :crlf layers now.

Replies are listed 'Best First'.
Re^4: Perl Windows vs Cygwin installs
by Anonymous Monk on Mar 24, 2012 at 00:09 UTC

    I don't feel like scrutinizing your post, but the wisdom from my link regarding 16le was to add :crlf last as in :raw:perlio:encoding(UTF-16le):crlf

      Yes, that's fine as a take home message.  Though it should be pointed out that the sequence of layers does work only in that exact order. I.e. simply adding the :crlf layer last is not sufficient, you also have to remove it left of the UTF-16 layer, which is what the :raw does.  In other words, saying ">:encoding(UTF-16LE):crlf" is not enough — not even with the new patch!  What adds to the unexpectedness is that some people might not even be aware that (on Windows, by default) there is another :crlf layer left of the UTF-16 layer when saying ">:encoding(UTF-16LE):crlf".

      Anyhow, I was mostly commenting on p5140-Selected Bug Fixes — in particular on this statement

      When binmode FH, ":crlf" pushes the :crlf layer on top of the stack, it no longer enables crlf layers lower in the stack, to avoid unexpected results [perl #38456].

      which sounds as if it was meant to fix the issue...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://961335]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2018-01-18 02:50 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (206 votes). Check out past polls.