punkish has asked for the wisdom of the Perl Monks concerning the following question:
I inherit the MySQL data dump. I am pretty certain encoding of the original db is all messed up; it is supposed to be utf8, is actually set as latin1, and has data in it that would be appropriate for utf8. Its first few lines are like so
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */; /*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */ +; /*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */; /*!40101 SET NAMES utf8 */; CREATE DATABASE /*!32312 IF NOT EXISTS*/ `dbname` /*!40100 DEFAULT + CHARACTER SET latin1 */; USE `dbname`; DROP TABLE IF EXISTS `foo`; /*!40101 SET @saved_cs_client = @@character_set_client */; /*!40101 SET character_set_client = utf8 */; CREATE TABLE `foo` ( ..
Besides the above, I know nothing more about the encoding. The new Pg database is utf8, and is in my control. Ideally, I would like to set up my routine so the Pg database is all correctly encoded in utf8. In other words, whatever the MySQL encoding, I want it to end up in Pg as utf8.
There are several complicated data types in the MySQL db, so I am unable to simply dump from MySQL and restore in Pg. I am now trying to simply query one db, and insert the values in the other db using Perl DBI. I keep on getting the error of the type 'invalid byte sequence for encoding "UTF8": 0x93 at...'. Here are my steps:
- dump data from MySQL (actually, I inherit the dump, so I have no control over how the dump is actually created)
- $ iconv -f latin1 -t UTF-8 in.sql > in_utf8.sql
- $ mysql < in_utf8.sql
- $ perl sync.pl
bam! error. The `sync.pl` script is a pretty simple
query table in mysql while (my ($col1, $col2... $coln) = $sth->fetch row_array) { ## I believe this is where I should be converting the $col1..n ## values to utf8, but don't know how. insert into Pg }
I have also tried the following variation -- changed step #2 above to sed 's/latin1/utf8/g' in.sql > in_utf8.sql and then loaded that in mysql in step #3 above before trying to insert it in Pg. Still, no joy. Similar utf8 encoding error.
Suggestions?
when small people start casting long shadows, it is time to go to bed