Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: CR-LF on UTF-16LE files on Windows

by ikegami (Patriarch)
on Nov 07, 2018 at 19:18 UTC ( [id://1225379]=note: print w/replies, xml ) Need Help??


in reply to Re^2: CR-LF on UTF-16LE files on Windows
in thread CR-LF on UTF-16LE files on Windows

:crlf converts 0D 0A into 0A on read, and it converts 0A into 0D 0A on write. This was being done to the encoded strings when it should have been done to the decoded strings.

(My earlier post has been edited to integrate this.)

Replies are listed 'Best First'.
Re^4: CR-LF on UTF-16LE files on Windows
by james28909 (Deacon) on Nov 08, 2018 at 00:04 UTC
    Would binmode() work? I don't have any files like that at my disposal to test.

      binmode would not work.

      When binmode applies :raw, it disables any existing :crlf layer rather than removing it. And a subsequent :crlf renables the existing :crlf layer rather than adding a new one. That means that

      binmode($fh, ':raw:encoding(UTF-16LE):crlf')

      is no different than

      binmode($fh, ':encoding(UTF-16LE)')

      It's therefore impossible to apply :encoding(UTF-16LE) to STDIN, STDOUT and STDERR on Windows (if you also want to :crlf). You'd need something like the following instead:

      open(my $fh, '<&=:raw:encoding(UTF-16le):crlf', fileno(STDIN)); *STDIN = $fh;

      (Untested)

        binmode would not work

        Is that so? I get the same (cases 3 and 4) correct result, regardless of layers stack being built through open or binmode.

        use strict; use warnings; use feature 'say'; use autodie; $, = ' '; { open my $f, '>:raw:encoding(UTF-16LE):crlf', 'test'; say $f 123; } { # 1 "pure binary slurp" open my $f, '<:raw', 'test'; undef local $/; say PerlIO::get_layers( $f ); say unpack '(H2)*', <$f>; } { # 2 OP's case open my $f, '<:encoding(UTF-16LE)', 'test'; say PerlIO::get_layers( $f ); say unpack '(H2)*', <$f>; } { # 3 correct open my $f, '<:raw:encoding(UTF-16LE):crlf', 'test'; say PerlIO::get_layers( $f ); say unpack '(H2)*', <$f>; } { # 4 correct open my $f, '<', 'test'; binmode $f, ':raw:encoding(UTF-16LE):crlf'; say PerlIO::get_layers( $f ); say unpack '(H2)*', <$f>; } { # 5 same as #2 open my $f, '<', 'test'; binmode $f, ':encoding(UTF-16LE)'; say PerlIO::get_layers( $f ); say unpack '(H2)*', <$f>; } __END__ unix crlf 31 00 32 00 33 00 0d 00 0a 00 unix crlf encoding(UTF-16LE) utf8 31 32 33 0d 0a unix crlf encoding(UTF-16LE) utf8 crlf utf8 31 32 33 0a unix crlf encoding(UTF-16LE) utf8 crlf utf8 31 32 33 0a unix crlf encoding(UTF-16LE) utf8 31 32 33 0d 0a

        But can output of PerlIO::get_layers be believed at all? There are a few utf8 (pseudo- -?) layers for which I didn't ask. Also, the bottommost crlf layer is not removed but rather disabled, in both 3 and 4 (and 1, too) cases. And not re-enabled later.

        However, I can :pop (rather than "disable") existing layers, and here open and binmode behave differently: the latter doesn't allow to go to the bottom of the stack. Don't know if these factoids are of any value though.

        { # 6 open my $f, '<:pop:pop:unix:encoding(UTF-16LE):crlf', 'test'; say PerlIO::get_layers( $f ); say unpack '(H2)*', <$f>; } { # 7 open my $f, '<', 'test'; binmode $f, ':pop:pop:unix:encoding(UTF-16LE):crlf'; say PerlIO::get_layers( $f ); say unpack '(H2)*', <$f>; } __END__ unix encoding(UTF-16LE) utf8 crlf utf8 31 32 33 0a unix encoding(UTF-16LE) utf8 crlf utf8 Use of uninitialized value in unpack at crlf.pl line 49. refcnt_dec: fd 0: 0 <= 0

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1225379]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-19 21:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found