Re^2: Chicanery Needed to Handle Unicode Text on Microsoft Windowsby Jim (Curate)
|on Oct 30, 2010 at 18:36 UTC||Need Help??|
As far as I can see, the only reason you need :crlf is because you've specifically added the UNIX line ending (\n) to your output.
:crlf is needed here to get the same platform-independent line-ending handling of plain text files Perl has always supported. Without it, the line-ending handling is badly broken. Half of the line-ending character pair CRLF is missed.
And as Anonymous Monk has already pointed out, \n is the express mechanism in Perl intended to make line-ending handling platform-independent. It is defined not to mean the LF-only Unix line-ending, but rather to mean whatever the line-ending character or character combination terminates lines of plain text files on the platform in use.
It would be better to use the platform-independent $/.
No it wouldn't. And even if it were better, how would someone new to Perl ever figure that out. I've been programming Perl for years and I've never once seen $/ used in place of the usual and ordinary \n. chomp()-ing and "...\n"-ing are the long-lived and ubiquitous standard idioms.
Except for ASCII files, binmode($file_handle) was required on MSWin32 systems. :raw performs the same function so, while perhaps appearing to add to the chicanery, it certainly reduces the amount of code.
But this is the whole point. The file named Input.txt is not a binary file; it's a plain text file. All the Unicode files I want to manipulate on Microsoft Windows using Perl, the text-processing scripting language, are plain text files. binmode() and :raw are lies. Chicanery.
In my humble opinion, this should work on a Unicode UTF-16 file with a byte order mark.
It seems perfectly reasonable to me to expect the scripting language to determine the character encoding of the file all by its little lonesome — it only has to read the first two bytes of the file — and just to do the right thing.