Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

unicode in windows

by =sjs= (Initiate)
on Oct 19, 2006 at 04:24 UTC ( [id://579269]=perlquestion: print w/replies, xml ) Need Help??

=sjs= has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to get started with unicode in windows and having no luck. For a starter I'd like to write a simple script that would open a file for input, another for output and copy the input to the output. I can't even get that working.
My input file is a text file created with windows word pad and saved as unicode. The code has japanese characters. I think unicode on windows means utf16 le.
Any help would be appreciated. This is my first script using unicode and I don't usually use windows XP.
Thanks
Steve S.

Replies are listed 'Best First'.
Re: unicode in windows
by chargrill (Parson) on Oct 19, 2006 at 04:35 UTC
    My input file is a text file ...

    This is my first script using unicode ...

    Sorry, I think I missed the part of your post where you showed us what you've tried, and what your input looks like.

    I haven't troubled myself to search for unicode here, though I imagine if one were to do that he or she might find some examples of scripts that work, or scripts that don't with solutions to issues posted in the respective threads.



    --chargrill
    s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)
      Input looks like:
      日曜日
      月曜日
      火曜日
      水曜日
      木曜日
      金曜日
      土曜日
      saved in word pad as a unicode text document.

      Mostly at the moment I'm thrashing around trying to get a handle. on how to do this. I'm trying to use the Unicode::String classes for input/output, but everything I've tried comes out with files different sizes and mojabake in file.

      I've been beating on this for several hours. At this point I'd be happy with just reading the file into a list and writing it out and having the files be the same size and have same contents. Of course I don't mean just opening in bin mode and copying bytes from here to there. I need to use reg-ex to manipulate the contents, but first things first.
        What have you tried? The following works for me:
        use strict; use warnings; open(my $fh_in, '<:raw:encoding(utf16le)', 'src.txt') or die("Unable to open src.txt: $!\n"); open(my $fh_out, '>:raw:encoding(utf16le)', 'dst.txt') or die("Unable to create file.txt: $!\n"); while (<$fh_in>) { if (/[\x{706B}\x{6C34}]/) { print("Found one at line $.!\n"); } print $fh_out $_; }

        The :raw prevents the CRLF->LF conversion when reading and the LF->CRLF when writting. The conversion only works with single-byte, ASCII-based (i.e. LF=0xA, CR=0xD) encodings.

        Use :raw:encoding(utf16le) if the file was saved using encoding "Unicode"
        Use :raw:encoding(utf16be) if the file was saved using encoding "Unicode big endian"
        Use :raw:utf8 if the file was saved using encoding "UTF-8"

        Update: Added example regexp to code.


      > I haven't troubled myself to search for
      > unicode here, though I imagine if one were
      > to do that he or she might find some examples of
      > scripts that work, or scripts that do with
      > solutions to issues posted in the respective
      > threads.
      I appreciate your delicate sense of sarcasim, but actually I spent several hours going through docs, looking for examples, etc.

      It turned out that my problem was that sometimes the editor was opening the file in utf mode and other times as ascii. God only knows how it made it's decision. So it turned out that nothing was making sense and I was chasing my tail.

      Thanks for the kind assistance sir chargrill, bargrill or whatever.

      Thanks everyone else too. I appreciate the help.

        -- for the personal attack. Unless you tell us what you have tried, then the assumption is that you have not tried anything. Just like in higher education, what you get out of this forum is related to what you put into it.

        Save the name calling for the playground.

        --MidLifeXis

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://579269]
Approved by chargrill
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2025-06-19 15:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.