http://www.perlmonks.org?node_id=649456

aplonis has asked for the wisdom of the Perl Monks concerning the following question:

I have some code to modify a *.INI file on WinXP which is in UTF16. And it seems to work. But when I open said modified copy in either Notepad or TextPad it displays with the null chars as whitespace.

Anybody know what is up with that?

Please tell me it is not one of those wretched registry entries which is needed. Or if it is, please give very specific details how to handle it...or an example.

I'm writing this mostly for other folks to use on WinXP. And even though I do have WinXP at work and on one of my laptops, I'm basically a NetBSD guy. Start in with the MS-speak and I'll be totally lost. Thanks for being compassionate to an outsider.

use PerlIO::encoding; # Modify a copy of a file... sub foo { my $thing_to_do_ref = shift; my @lines; if (open IN, "<", "$some_path") { binmode IN, ":encoding(UTF-16LE)"; while ( my $a_line = <IN> ) { # Do stuff to $a_line... } push @lines, $a_line; } close IN; if (open OUT, ">", "$some_other_path") { binmode OUT, ":encoding(UTF-16LE)"; while (@lines) { my $line = shift @lines; print OUT "$line\n"; } close OUT; } else { print "Oops! Could not write: $! \n"; } } else { print "Oops! Could not read: $! \n"; } }

Replies are listed 'Best First'.
Re: UTF-16 on WinXP written by Perl shows whitespaces.
by BrowserUk (Patriarch) on Nov 07, 2007 at 12:18 UTC

    To expand on anonymonks terseness, you need to make sure that the file is read and written as binary to avoid newline translations. "pushing raw" translates to prefixing the encoding with ':raw'. The following works (copies a utf16le file with futzing with it) for me, but without the ':raw's, I see the same symptoms you describe.

    #! perl -slw use strict; use PerlIO::encoding; open IN, '<:raw:encoding(UTF-16LE)', $ARGV[ 0 ]; open OUT, '>:raw:encoding(UTF-16LE)', $ARGV[ 0 ] . '.modified'; print OUT <IN>; close $_ for *IN, *OUT;

    You can pass the same argument string to binmode


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      open IN, '<:raw:encoding(UTF-16LE)', $ARGV[ 0 ]; open OUT, '>:raw:encoding(UTF-16LE)', $ARGV[ 0 ] . '.modified';

      For text files, I'd suggest using :raw:encoding(UTF-16LE):crlf:utf8 instead, in which case Perl will do proper linefeed translations (on Windows), just like it would without the :encoding(UTF-16LE).  Otherwise, be prepared to handle trailing carriage returns (\r) yourself — chomp (with the default $/ setting), for example, will not remove them...

      Excellent! Most Excellent! I just tried it and it looks good (single spaced) to both Notepad and Textpad.

      I can't try it for real from work on account of the IT firewall here. It is for the *.INI file of an Inet app.

      But it looks good for file size and presentation to Notepad and TextPad both. So I have every confidence

      Next thing I'm doing is search for a support link that I may PayPal a donation to PerlMonks. Thanks again.

Re: UTF-16 on WinXP written by Perl shows whitespaces.
by moritz (Cardinal) on Nov 07, 2007 at 12:13 UTC
    I think it should work the way you're doing it.

    Could you please try to use your script to copy a file without any modifications, and then do a binary comparison of input and output file?

Re: UTF-16 on WinXP written by Perl shows whitespaces.
by Anonymous Monk on Nov 07, 2007 at 12:12 UTC