|Problems? Is your data what you think it is?|
Re: Chicanery Needed to Handle Unicode Text on Microsoft Windowsby kcott (Canon)
|on Oct 30, 2010 at 09:15 UTC||Need Help??|
As far as I can see, the only reason you need :crlf is because you've specifically added the UNIX line ending (\n) to your output. It would be better to use the platform-independent $/. The :raw layer should preserve the line endings. So that reduces the chicanery somewhat.
Except for ASCII files, binmode($file_handle) was required on MSWin32 systems. :raw performs the same function so, while perhaps appearing to add to the chicanery, it certainly reduces the amount of code.
I don't have sufficient knowledge of UTF-16 to address that aspect of you post. What I would suggest is that, after removing :crlf and changing \n to $/, you try your test code without :perlio. You may still need it but it wouldn't hurt to check.
I agree there's a lot of Unicode-related documentation; however, everything I've made reference to is available here: PerlIO.
I ran a series of tests, click on Read more... to view.
Input files in UNIX and DOS formats:
Output after running on UNIX platform:
Output after running on DOS platform:
Changing the while loop to chomp input and add $/ (not \n) to output:
Adding :crlf to MSWin32 input and output modes (now = :raw:crlf) and there's no change:
With :raw:perlio:crlf, there's no change:
And, for completeness, with :raw:perlio, there's no change: