Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: unicode in windows

by chargrill (Parson)
on Oct 19, 2006 at 04:35 UTC ( [id://579271]=note: print w/replies, xml ) Need Help??


in reply to unicode in windows

My input file is a text file ...

This is my first script using unicode ...

Sorry, I think I missed the part of your post where you showed us what you've tried, and what your input looks like.

I haven't troubled myself to search for unicode here, though I imagine if one were to do that he or she might find some examples of scripts that work, or scripts that don't with solutions to issues posted in the respective threads.



--chargrill
s**lil*; $*=join'',sort split q**; s;.*;grr; &&s+(.(.)).+$2$1+; $; = qq-$_-;s,.*,ahc,;$,.=chop for split q,,,reverse;print for($,,$;,$*,$/)

Replies are listed 'Best First'.
Re^2: unicode in windows
by =sjs= (Initiate) on Oct 19, 2006 at 04:58 UTC
    Input looks like:
    日曜日
    月曜日
    火曜日
    水曜日
    木曜日
    金曜日
    土曜日
    saved in word pad as a unicode text document.

    Mostly at the moment I'm thrashing around trying to get a handle. on how to do this. I'm trying to use the Unicode::String classes for input/output, but everything I've tried comes out with files different sizes and mojabake in file.

    I've been beating on this for several hours. At this point I'd be happy with just reading the file into a list and writing it out and having the files be the same size and have same contents. Of course I don't mean just opening in bin mode and copying bytes from here to there. I need to use reg-ex to manipulate the contents, but first things first.
      What have you tried? The following works for me:
      use strict; use warnings; open(my $fh_in, '<:raw:encoding(utf16le)', 'src.txt') or die("Unable to open src.txt: $!\n"); open(my $fh_out, '>:raw:encoding(utf16le)', 'dst.txt') or die("Unable to create file.txt: $!\n"); while (<$fh_in>) { if (/[\x{706B}\x{6C34}]/) { print("Found one at line $.!\n"); } print $fh_out $_; }

      The :raw prevents the CRLF->LF conversion when reading and the LF->CRLF when writting. The conversion only works with single-byte, ASCII-based (i.e. LF=0xA, CR=0xD) encodings.

      Use :raw:encoding(utf16le) if the file was saved using encoding "Unicode"
      Use :raw:encoding(utf16be) if the file was saved using encoding "Unicode big endian"
      Use :raw:utf8 if the file was saved using encoding "UTF-8"

      Update: Added example regexp to code.

Re^2: unicode in windows
by =sjs= (Initiate) on Oct 19, 2006 at 09:01 UTC

    > I haven't troubled myself to search for
    > unicode here, though I imagine if one were
    > to do that he or she might find some examples of
    > scripts that work, or scripts that do with
    > solutions to issues posted in the respective
    > threads.
    I appreciate your delicate sense of sarcasim, but actually I spent several hours going through docs, looking for examples, etc.

    It turned out that my problem was that sometimes the editor was opening the file in utf mode and other times as ascii. God only knows how it made it's decision. So it turned out that nothing was making sense and I was chasing my tail.

    Thanks for the kind assistance sir chargrill, bargrill or whatever.

    Thanks everyone else too. I appreciate the help.

      -- for the personal attack. Unless you tell us what you have tried, then the assumption is that you have not tried anything. Just like in higher education, what you get out of this forum is related to what you put into it.

      Save the name calling for the playground.

      --MidLifeXis

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://579271]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (5)
As of 2024-09-16 05:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The PerlMonks site front end has:





    Results (21 votes). Check out past polls.

    Notices?
    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.