Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Perl Windows vs Cygwin installs

by gholley0 (Initiate)
on Mar 23, 2012 at 19:15 UTC ( #961294=perlquestion: print w/ replies, xml ) Need Help??
gholley0 has asked for the wisdom of the Perl Monks concerning the following question:

I have a set of scripts to update files that have DOS-style line endings. I've written my scripts using a perl version built for Cygwin, and have been careful to use '\r\n' to terminate lines in my print statements. Someone else ran the same script using a perl version built for Windows, and he wound up with an extra carriage return.

How can I make my code work consistently across Perl versions?

The cygwin build is "perl, v5.10.0 built for cygwin-thread-multi-64int". With this version, the perl command

perl -e 'print "unix\n";print "dos\r\n";' | cat -e

produces the output

unix$ dos^M$

where carriage returns are displayed as ^M and line feeds as $. The same code executed with the Windows build ("perl, v5.6.1 built for MSWin32-x86-multi-thread") produces output

unix^M$ dos^M^M$
In case it's relevant, the more complete "perl -v" output is shown below, first for the newer Cygwin build and second for the older Windows build:
> perl -v This is perl, v5.10.0 built for cygwin-thread-multi-64int (with 6 registered patches, see perl -V for more detail) Copyright 1987-2007, Larry Wall Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5 source ki +t. Complete documentation for Perl, including FAQ lists, should be found +on this system using "man perl" or "perldoc perl". If you have access to + the Internet, point your browser at http://www.perl.org/, the Perl Home Pa +ge.
> Perl.exe -v This is perl, v5.6.1 built for MSWin32-x86-multi-thread (with 1 registered patch, see perl -V for more detail) Copyright 1987-2001, Larry Wall Binary build 629 provided by ActiveState Tool Corp. http://www.ActiveS +tate.com Built 12:27:04 Aug 20 2001 Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5 source ki +t. Complete documentation for Perl, including FAQ lists, should be found +on this system using `man perl' or `perldoc perl'. If you have access to + the Internet, point your browser at http://www.perl.com/, the Perl Home Pa +ge.

Comment on Perl Windows vs Cygwin installs
Select or Download Code
Re: Perl Windows vs Cygwin installs
by Eliya (Vicar) on Mar 23, 2012 at 19:56 UTC

    Don't mess with \r\n yourself.

    When running the code with the Cygwin perl, add the PerlIO layer ":crlf" to the respective file handles, and with the other perl, don't, because a native Windows perl already has the :crlf layer enabled.  (Actually, you should be able to simply add the layer in both cases, because due to the implementation details, the layer will only ever be once on the layer stack.)

    This presumes the idea is to generate files with the native Windows newline style.  If you just want your scripts to work within either a Unix/Cygwin or a Windows enviroment (without exchanging files between both "worlds"), simply use \n and be happy.

        It still matters with newer perls, too.

        It's kind of a pity the patch you linked to doesn't really fix the issue it (apparently) set out to fix, i.e. the long standing bug with encodings like UTF-16 in combination with the :crlf layer.

        I just checked it with 5.15.8, and I still see the same "unexpected" behavior, as it always has been. That is, when naÔvely pushing a UTF-16 layer to enable UTF-16 functionality (on Windows), corrupted files are produced on writing, and carriage returns are not being removed upon reading:

        --- writing ---

        #!/usr/local/perl/5.15.8/bin/perl -w my $fname = "foo.utf16"; open my $out, ">:crlf:encoding(UTF-16LE)", $fname or die; print $out "\x{feff}\x{1234}\n\x{5678}\n";
        $ ./test-out.pl $ hexdump foo.utf16 0000000 feff 1234 0a0d 7800 0d56 000a 000000c

        Wrong!  correct encoding should be:

        $ hexdump foo.utf16 0000000 feff 1234 000d 000a 5678 000d 000a 000000e

        --- reading ---

        #!/usr/local/perl/5.15.8/bin/perl -w use Devel::Peek; my $fname = "foo.utf16"; # create correct file, using the same old layer mantra # (the extra :utf8 is only required with older perls) open my $out, ">:raw:encoding(UTF-16LE):crlf:utf8", $fname or die; print $out "\x{feff}\x{1234}\n\x{5678}\n"; close $out; # read file back in open my $in, "<:crlf:encoding(UTF-16LE)", $fname or die; $/ = undef; Dump <$in>;
        $ ./test-in.pl SV = PV(0x77dc60) at 0x953728 REFCNT = 1 FLAGS = (TEMP,POK,pPOK,UTF8) PV = 0x829130 "\357\273\277\341\210\264\r\n\345\231\270\r\n"\0 [UTF8 + "\x{feff}\x{1234}\r\n\x{5678}\r\n"] CUR = 13 ^ ^ LEN = 14

        Wrong!  \r should've been removed.

        (Note that because I tested this on Unix, I had to push :crlf myself. With a native Windows perl, the layer would of course already have been in place — i.e., you'd just say ">:encoding(UTF-16LE)" or "<:encoding(UTF-16LE)" (as anyone unaware of the issue would likely have tried).)

        Personally, I think allowing another :crlf to be pushed on the stack (as it is now after the patch) is not the right approach to fix the issue, because you still have to manually rearrange the layers to get correct results.  I fail to see the benefit of being allowed to have two :crlf layers now.

Re: Perl Windows vs Cygwin installs
by Anonymous Monk on Mar 23, 2012 at 20:09 UTC
Re: Perl Windows vs Cygwin installs
by linuxkid (Sexton) on Mar 23, 2012 at 22:04 UTC
    just use \n perl deals with it automatically based on your os. i don't worry about it. only use \r\n if you're working with sockets

    --linuxkid


    imrunningoutofideas.co.cc

      No.

      \n will mean 0x0D 0x0A in a CR LF environment (DOS), 0x0D in a CR environment (Mac Classic), and 0x0A in a NL environment (pretty much everything else). You mean \015\012 (or \x0d\x0a if youíre a hexed person) when dealing with sockets, but even thatís not true, because itís 0x0D 0x0A only in protocols that define new lines as a CR LF sequence. Even so, most of the time, receiving NL-separated messages in such protocols is generally okay, as per ď(Be strict in what you send), but forgiving in what you receiveĒ.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://961294]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (9)
As of 2014-09-18 15:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (116 votes), past polls