Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

As far as I can see, the only reason you need :crlf is because you've specifically added the UNIX line ending (\n) to your output. It would be better to use the platform-independent $/. The :raw layer should preserve the line endings. So that reduces the chicanery somewhat.

Except for ASCII files, binmode($file_handle) was required on MSWin32 systems. :raw performs the same function so, while perhaps appearing to add to the chicanery, it certainly reduces the amount of code.

I don't have sufficient knowledge of UTF-16 to address that aspect of you post. What I would suggest is that, after removing :crlf and changing \n to $/, you try your test code without :perlio. You may still need it but it wouldn't hurt to check.

I agree there's a lot of Unicode-related documentation; however, everything I've made reference to is available here: PerlIO.

I ran a series of tests, click on Read more... to view.

Starting code:

#!perl use 5.12.0; use warnings; my $in_file = $^O eq 'MSWin32' ? 'utf16_LE_prob.dos_dat' : 'utf16_LE_prob.unix_dat'; my $out_file = $^O eq 'MSWin32' ? 'utf16_LE_prob.dos_out' : 'utf16_LE_prob.unix_out'; my $in_mode = $^O eq 'MSWin32' ? '<:raw' : '<'; my $out_mode = $^O eq 'MSWin32' ? '>:raw' : '>'; open my $in_fh, $in_mode, $in_file or die $!; open my $out_fh, $out_mode, $out_file or die $!; while (my $line = <$in_fh>) { print $out_fh $line; } close $out_fh; close $in_fh;

Input files in UNIX and DOS formats:

$ cat -vet utf16_LE_prob.unix_dat utf16_LE_prob.dos_dat Line 1$ Line 2$ $ Line 1^M$ Line 2^M$ ^M$

Output after running on UNIX platform:

$ cat -vet utf16_LE_prob.unix_out Line 1$ Line 2$ $

Output after running on DOS platform:

$ cat -vet utf16_LE_prob.dos_out Line 1^M$ Line 2^M$ ^M$

Changing the while loop to chomp input and add $/ (not \n) to output:

while (my $line = <$in_fh>) { chomp $line, print $out_fh $line, $/; }

New output:

$ cat -vet utf16_LE_prob.unix_out utf16_LE_prob.dos_out Line 1$ Line 2$ $ Line 1^M$ Line 2^M$ ^M$

Adding :crlf to MSWin32 input and output modes (now = :raw:crlf) and there's no change:

$ cat -vet utf16_LE_prob.unix_out utf16_LE_prob.dos_out Line 1$ Line 2$ $ Line 1^M$ Line 2^M$ ^M$

With :raw:perlio:crlf, there's no change:

$ cat -vet utf16_LE_prob.unix_out utf16_LE_prob.dos_out Line 1$ Line 2$ $ Line 1^M$ Line 2^M$ ^M$

And, for completeness, with :raw:perlio, there's no change:

$ cat -vet utf16_LE_prob.unix_out utf16_LE_prob.dos_out Line 1$ Line 2$ $ Line 1^M$ Line 2^M$ ^M$

-- Ken


In reply to Re: Chicanery Needed to Handle Unicode Text on Microsoft Windows by kcott
in thread Chicanery Needed to Handle Unicode Text on Microsoft Windows by Jim

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others perusing the Monastery: (9)
    As of 2015-07-07 11:02 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









      Results (88 votes), past polls