Beefy Boxes and Bandwidth Generously Provided by pair Networks kudra
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Chicanery Needed to Handle Unicode Text on Microsoft Windows

by Anonymous Monk
on Oct 30, 2010 at 07:51 UTC ( #868432=note: print w/ replies, xml ) Need Help??


in reply to Chicanery Needed to Handle Unicode Text on Microsoft Windows

He Googles for help

utf16le site:perlmonks.org
UTF-16 on WinXP written by Perl shows whitespaces.
crlf mess in unicode utf-16le

Can someone explain how this sequence of PerlIO layers works?

See PerlIO

Why must so many layers be used?

Because of the defaults, see PerlIO

Can these layers be specified using the open pragma? If so, how? If not, why not?

This should work

use open qw' IO :raw:perlio:encoding(UTF-16LE):crlf ';
but apparently open pragma is broken and doesn't accept the same things as binmode/open

And why has this ancient Perl bug still not been fixed in 5.12.2?

I'm not a perl5-porter so I'm not sure, but it doesn't look like a bug exactly, and nobodys come up with a better way, or reported a bug (that I could find).

It seems there's no way to generate a UTF-16 file in little-endian byte order directly. To generate such a file, you have to specify the UTF-16LE CES (which is wrong) and add the byte order mark explictly to make it UTF-16 instead of UTF-16LE.

maybe :encoding(UTF-16LE):via(File::BOM)


Comment on Re: Chicanery Needed to Handle Unicode Text on Microsoft Windows
Download Code
Re^2: Chicanery Needed to Handle Unicode Text on Microsoft Windows
by brxnd (Initiate) on Sep 26, 2012 at 03:07 UTC

    This thread is refreshing to read!!! As a Windows user that is somewhat new to Perl, I spent the past few hours trying to figure out why one of my supplied 193 xml files would keep outputting as a bunch of Chinese (?) characters. Jim described exactly what I kept trying.

    I finished my script. Everything else works - it does all my replaces beautifully. I have maybe spent 8 hours total on my script and it will save me about 3 days of work.

    But, for now, I have to go to that specific XML file, open it in Notepad, and save it as 'ANSI' instead of 'Unicode' before my script will work right.

    I have tried adding the use ' $string' supplied in this thread, but I get this error:

    Unknown PerlIO layer 'raw:perlio:encoding(UTF-16LE):crlf:utf8'

    I really would like to create re-usable code out of my script, but I have yet to find the answer.

      I have tried adding the use ' $string' supplied in this thread, but I get this error:

      Which perl version do you have?

      open it in Notepad, and save it as 'ANSI' instead of 'Unicode' before my script will work right.

      You probably shouldn't do that :) save as UTF-8 instead

      iconv -f UTF-16 -t UTF-8 < in > out

      piconv -f UTF-16LE -t UTF-8 < in > out

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://868432]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (15)
As of 2014-04-16 20:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (433 votes), past polls