dealing with encoding while converting data from MySQL to Postgresby punkish (Priest)
|on Dec 10, 2011 at 04:50 UTC||Need Help??|
punkish has asked for the
wisdom of the Perl Monks concerning the following question:
I am at my wits end moving data from MySQL to Postgres (long story).
I inherit the MySQL data dump. I am pretty certain encoding of the original db is all messed up; it is supposed to be utf8, is actually set as latin1, and has data in it that would be appropriate for utf8. Its first few lines are like so
Besides the above, I know nothing more about the encoding. The new Pg database is utf8, and is in my control. Ideally, I would like to set up my routine so the Pg database is all correctly encoded in utf8. In other words, whatever the MySQL encoding, I want it to end up in Pg as utf8.
There are several complicated data types in the MySQL db, so I am unable to simply dump from MySQL and restore in Pg. I am now trying to simply query one db, and insert the values in the other db using Perl DBI. I keep on getting the error of the type 'invalid byte sequence for encoding "UTF8": 0x93 at...'. Here are my steps:
bam! error. The `sync.pl` script is a pretty simple
I have also tried the following variation -- changed step #2 above to sed 's/latin1/utf8/g' in.sql > in_utf8.sql and then loaded that in mysql in step #3 above before trying to insert it in Pg. Still, no joy. Similar utf8 encoding error.
when small people start casting long shadows, it is time to go to bed