Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
As far as I can see, the only reason you need :crlf is because you've specifically added the UNIX line ending (\n) to your output.

:crlf is needed here to get the same platform-independent line-ending handling of plain text files Perl has always supported. Without it, the line-ending handling is badly broken. Half of the line-ending character pair CRLF is missed.

D:\>cat Demo.pl #!perl use strict; use warnings; open my $input_fh, '<:raw:perlio:encoding(UTF-16LE)', 'Input.txt'; while (my $line = <$input_fh>) { chomp $line; print "There's an unexpected/unwanted CR at the end of the line\n" if $line =~ m/\r$/; } D:\>file Input.txt Input.txt: Text file, Unicode little endian format D:\>cat Input.txt We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America. D:\>perl Demo.pl Input.txt There's an unexpected/unwanted CR at the end of the line There's an unexpected/unwanted CR at the end of the line There's an unexpected/unwanted CR at the end of the line There's an unexpected/unwanted CR at the end of the line There's an unexpected/unwanted CR at the end of the line D:\>

And as Anonymous Monk has already pointed out, \n is the express mechanism in Perl intended to make line-ending handling platform-independent. It is defined not to mean the LF-only Unix line-ending, but rather to mean whatever the line-ending character or character combination terminates lines of plain text files on the platform in use.

It would be better to use the platform-independent $/.

No it wouldn't. And even if it were better, how would someone new to Perl ever figure that out. I've been programming Perl for years and I've never once seen $/ used in place of the usual and ordinary \n. chomp()-ing and "...\n"-ing are the long-lived and ubiquitous standard idioms.

#!perl print "Hello, world\n";
Except for ASCII files, binmode($file_handle) was required on MSWin32 systems. :raw performs the same function so, while perhaps appearing to add to the chicanery, it certainly reduces the amount of code.

But this is the whole point. The file named Input.txt is not a binary file; it's a plain text file. All the Unicode files I want to manipulate on Microsoft Windows using Perl, the text-processing scripting language, are plain text files. binmode() and :raw are lies. Chicanery.

In my humble opinion, this should work on a Unicode UTF-16 file with a byte order mark.

#!perl use strict; use warnings; open my $input_fh, '<', 'Input.txt'; open my $output_fh, '>', 'Output.txt'; while (my $line = <$input_fh>) { chomp $line; print $output_fh "$line\n"; }

It seems perfectly reasonable to me to expect the scripting language to determine the character encoding of the file all by its little lonesome it only has to read the first two bytes of the file and just to do the right thing.


In reply to Re^2: Chicanery Needed to Handle Unicode Text on Microsoft Windows by Jim
in thread Chicanery Needed to Handle Unicode Text on Microsoft Windows by Jim

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others meditating upon the Monastery: (5)
    As of 2014-08-02 10:10 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Who would be the most fun to work for?















      Results (55 votes), past polls