Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: dealing with encoding while converting data from MySQL to Postgres

by moritz (Cardinal)
on Dec 10, 2011 at 05:54 UTC ( [id://942768]=note: print w/replies, xml ) Need Help??


in reply to dealing with encoding while converting data from MySQL to Postgres

Besides the above, I know nothing more about the encoding.

Then you need to learn more about the encoding. I've seen enough crazy things (like putting UTF-8 into latin-1 tables) to advise you strongly not to blindly trust a DB, but look at the data that comes out of it, and find out which encoding(s) it uses. Yes, that takes time and effort, but if you don't invest it, you'll have ten times the effort fixing it later on.

# $ iconv -f latin1 -t UTF-8 in.sql > in_utf8.sql # $ mysql < in_utf8.sql

That's a pretty bad idea. As you've even shown us, the mysql dump contains meta information about the character encoding, which you don't change. Assuming that the meta information was accurate before your change, it is now certainly wrong. So, don't do that. Rather use the original database (or recreate it from the dump), and connect to that, and do all encoding conversion with Perl.

bam! error. The `sync.pl` script is a pretty simple

while (my ($col1, $col2... $coln) = $sth->fetch row_array) { ## I believe this is where I should be converting the $col1..n ## values to utf8, but don't know how. insert into Pg }

You haven't shown us the interesting part where you set up DBD::mysql and DBD::Pg to deal with UTF-8. If you haven't done so, please read the documentation of these two modules regarding the handling of encodings in general, and UTF-8 in particular.

As for converting the encoding, Encode can do all that is left to do after you set up the two DBD modules correctly (which could be nothing at all).

while (my ($col1, $col2... $coln) =

I'd simply use while (my @columns = ... here instead, less typing.

Encoding problems should be treated like any other problems in programming. You have to look carefully at the input data, decide what the result should be, and trace the data as it is processed in your program to the point where it deviates from your expectation. Once you found that point, fix it and continue.

See also: Character Encodings in Perl.

Replies are listed 'Best First'.
Re^2: dealing with encoding while converting data from MySQL to Postgres
by punkish (Priest) on Dec 11, 2011 at 04:20 UTC
    Thanks moritz. Adding {mysql_enably_utf8 => 1} and {pg_enable_utf8 => 1} to the db handles for the two databases seems to have solved the problem.

    There were other issues of incompatibly between the two databases that I had to solve, but now I can move data from one to the other.



    when small people start casting long shadows, it is time to go to bed

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://942768]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-20 04:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found