Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

How to read the unicode file

by shan_emails (Beadle)
on Sep 18, 2008 at 14:54 UTC ( [id://712299]=perlquestion: print w/replies, xml ) Need Help??

shan_emails has asked for the wisdom of the Perl Monks concerning the following question:

Hi All, This is shanmugam I have one *.txt file. And this file is in Unicode format. I want to open this file and do some pattern match. how to i do it. Note: I tried this via winword. ie., open that text file in winword, and save as "ansi" format. But it takes more time and gets hanged the system. Please give the right way to go. Thanks in advance. Shanmugam A.

Replies are listed 'Best First'.
Re: How to read the unicode file
by Corion (Patriarch) on Sep 18, 2008 at 14:58 UTC

    You have been advised in the chatterbox already to show the Perl code that relates to this problem. From your snippet you showed in the CB, I see you're not checking the error while opening the file. Maybe that is a problem. Or maybe your machine does not have enough RAM to read the file as a whole into memory. Or maybe your algorithm is just bad and takes a very long time for more lines. You don't show any code, so we can't help you further.

Re: How to read the unicode file
by moritz (Cardinal) on Sep 18, 2008 at 15:13 UTC
    The problem is that "Unicode format" is not a character encoding, and is a very non-precise term.

    Chances are that the file is either encoded in UCS2 or in UTF-16LE. So to open the file, do something along these lines:

    open my $handle, '<:encoding(UTF-16LE)', $filename or die "Can't open file '$filename' for reading: $!";

    For more information on unicode handling please read Perl and character encodings, perluniintro and the documentation for the Encode module.

Re: How to read the unicode file
by Juerd (Abbot) on Sep 18, 2008 at 16:46 UTC

    Unicode text files are read exactly the same way that other text files are read: by specifying a text encoding.

    open my $fh, "< :encoding(UTF-8)", $filename or die "open: $!";

    Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://712299]
Approved by jettero
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (7)
As of 2024-04-23 20:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found