http://www.perlmonks.org?node_id=942765

punkish has asked for the wisdom of the Perl Monks concerning the following question:

I am at my wits end moving data from MySQL to Postgres (long story).

I inherit the MySQL data dump. I am pretty certain encoding of the original db is all messed up; it is supposed to be utf8, is actually set as latin1, and has data in it that would be appropriate for utf8. Its first few lines are like so

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */; /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */ +; /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */; /*!40101 SET NAMES utf8 */; CREATE DATABASE /*!32312 IF NOT EXISTS*/ `dbname` /*!40100 DEFAULT + CHARACTER SET latin1 */; USE `dbname`; DROP TABLE IF EXISTS `foo`; /*!40101 SET @saved_cs_client = @@character_set_client */; /*!40101 SET character_set_client = utf8 */; CREATE TABLE `foo` ( ..

Besides the above, I know nothing more about the encoding. The new Pg database is utf8, and is in my control. Ideally, I would like to set up my routine so the Pg database is all correctly encoded in utf8. In other words, whatever the MySQL encoding, I want it to end up in Pg as utf8.

There are several complicated data types in the MySQL db, so I am unable to simply dump from MySQL and restore in Pg. I am now trying to simply query one db, and insert the values in the other db using Perl DBI. I keep on getting the error of the type 'invalid byte sequence for encoding "UTF8": 0x93 at...'. Here are my steps:

  1. dump data from MySQL (actually, I inherit the dump, so I have no control over how the dump is actually created)
  2. $ iconv -f latin1 -t UTF-8 in.sql > in_utf8.sql
  3. $ mysql < in_utf8.sql
  4. $ perl sync.pl

bam! error. The `sync.pl` script is a pretty simple

query table in mysql while (my ($col1, $col2... $coln) = $sth->fetch row_array) { ## I believe this is where I should be converting the $col1..n ## values to utf8, but don't know how. insert into Pg }

I have also tried the following variation -- changed step #2 above to sed 's/latin1/utf8/g' in.sql > in_utf8.sql and then loaded that in mysql in step #3 above before trying to insert it in Pg. Still, no joy. Similar utf8 encoding error.

Suggestions?



when small people start casting long shadows, it is time to go to bed