Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^2: Perl Windows vs Cygwin installs

by Anonymous Monk
on Mar 23, 2012 at 21:17 UTC ( #961320=note: print w/ replies, xml ) Need Help??


in reply to Re: Perl Windows vs Cygwin installs
in thread Perl Windows vs Cygwin installs

In older perls it matters where crlf is added and how, see :raw:perlio:encoding(UTF-16le):crlf, p5140-Selected Bug Fixes


Comment on Re^2: Perl Windows vs Cygwin installs
Re^3: Perl Windows vs Cygwin installs
by Eliya (Vicar) on Mar 23, 2012 at 23:56 UTC

    It still matters with newer perls, too.

    It's kind of a pity the patch you linked to doesn't really fix the issue it (apparently) set out to fix, i.e. the long standing bug with encodings like UTF-16 in combination with the :crlf layer.

    I just checked it with 5.15.8, and I still see the same "unexpected" behavior, as it always has been. That is, when na´vely pushing a UTF-16 layer to enable UTF-16 functionality (on Windows), corrupted files are produced on writing, and carriage returns are not being removed upon reading:

    --- writing ---

    #!/usr/local/perl/5.15.8/bin/perl -w my $fname = "foo.utf16"; open my $out, ">:crlf:encoding(UTF-16LE)", $fname or die; print $out "\x{feff}\x{1234}\n\x{5678}\n";
    $ ./test-out.pl $ hexdump foo.utf16 0000000 feff 1234 0a0d 7800 0d56 000a 000000c

    Wrong!  correct encoding should be:

    $ hexdump foo.utf16 0000000 feff 1234 000d 000a 5678 000d 000a 000000e

    --- reading ---

    #!/usr/local/perl/5.15.8/bin/perl -w use Devel::Peek; my $fname = "foo.utf16"; # create correct file, using the same old layer mantra # (the extra :utf8 is only required with older perls) open my $out, ">:raw:encoding(UTF-16LE):crlf:utf8", $fname or die; print $out "\x{feff}\x{1234}\n\x{5678}\n"; close $out; # read file back in open my $in, "<:crlf:encoding(UTF-16LE)", $fname or die; $/ = undef; Dump <$in>;
    $ ./test-in.pl SV = PV(0x77dc60) at 0x953728 REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x829130 "\357\273\277\341\210\264\r\n\345\231\270\r\n"\0 [UTF8 + "\x{feff}\x{1234}\r\n\x{5678}\r\n"] CUR = 13 ^ ^ LEN = 14

    Wrong!  \r should've been removed.

    (Note that because I tested this on Unix, I had to push :crlf myself. With a native Windows perl, the layer would of course already have been in place — i.e., you'd just say ">:encoding(UTF-16LE)" or "<:encoding(UTF-16LE)" (as anyone unaware of the issue would likely have tried).)

    Personally, I think allowing another :crlf to be pushed on the stack (as it is now after the patch) is not the right approach to fix the issue, because you still have to manually rearrange the layers to get correct results.  I fail to see the benefit of being allowed to have two :crlf layers now.

      I don't feel like scrutinizing your post, but the wisdom from my link regarding 16le was to add :crlf last as in :raw:perlio:encoding(UTF-16le):crlf

        Yes, that's fine as a take home message.  Though it should be pointed out that the sequence of layers does work only in that exact order. I.e. simply adding the :crlf layer last is not sufficient, you also have to remove it left of the UTF-16 layer, which is what the :raw does.  In other words, saying ">:encoding(UTF-16LE):crlf" is not enough — not even with the new patch!  What adds to the unexpectedness is that some people might not even be aware that (on Windows, by default) there is another :crlf layer left of the UTF-16 layer when saying ">:encoding(UTF-16LE):crlf".

        Anyhow, I was mostly commenting on p5140-Selected Bug Fixes — in particular on this statement

        When binmode FH, ":crlf" pushes the :crlf layer on top of the stack, it no longer enables crlf layers lower in the stack, to avoid unexpected results [perl #38456].

        which sounds as if it was meant to fix the issue...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://961320]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2014-07-25 09:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (170 votes), past polls