Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

how to read from utf8 text file?

by MelaOS (Beadle)
on Jul 01, 2008 at 11:24 UTC ( #694923=perlquestion: print w/replies, xml ) Need Help??
MelaOS has asked for the wisdom of the Perl Monks concerning the following question:

hi there all fellow monks,

i have a mysql in unix, which i later use > dump file command line to retrieve the content of the table, and then later via winscp, i download these files to window. i then want to have a perl script which can read these text files into ms access database. problem is i can't seem to be able to read the utf8 text file properly.

when i open the text file in notepad, i can see gibberish, but when i open them in words and use encoding to utf8, i can see my words fine. so does this mean that my file is already in utf8? as i've read that perl stores stuff internally as utf8 i shouldn't have problem reading the text file and inserting them into a ms access database right?

i'm totally stuck at the reading part, as i'm unsure what is wrong and how to proceed from here. any help would be more than helpful. and for the reading i'm using a basic open line command as open with "<:utf8" seems to make stuff gibberish. and how i double check is to have perl output the read lines back into another output text file.

hope this is clear enough. thanks


Replies are listed 'Best First'.
Re: how to read from utf8 text file?
by moritz (Cardinal) on Jul 01, 2008 at 12:06 UTC
    If the file is in UTF-8 indeed, open your file like this:
    open my $handle, '<:encoding(UTF-8)', $name or die "Can't open '$name' for reading: $!";

    Then you also have to decide for an encoding when you enter it into your database, or print it to your console.

    You have to know your DBs encoding for that, and I know too little of Microsoft's various programs to help you with that.

    I think MS uses UTF-16le for many things, it might be worth a try.

    See also perluniintro, Encode, perlunicode, Character Sets and Unicode.

      Hi. Please note that perluniintro and perlunicode are experienced as tough and opaque for people either new to Perl or new to character encodings. That's why I've written perlunitut and perlunifaq a while ago. These may be better suited for someone asking a basic question. They're distributed with Perl nowadays.

      Juerd # { site => '', do_not_use => 'spamtrap', perl6_server => 'feather' }

      hi man, thanks for the prompt reply, i've tried the opening method, but when i write back out to a text file just for testing right. all the strings looks like gibberish.
      is there any good way to verify how perl is reading my utf8 strings? i've tried inserting utf8 directly into ms access. so the only thing i'm worried about here is how the text get handled in perl before it get passed over to access.
      Thanks, Its' worked for me....

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://694923]
Approved by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2017-02-24 20:38 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (363 votes). Check out past polls.