Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

UTF8 to Mysql transformed by mysql.so?

by jabowery (Beadle)
on Jun 21, 2012 at 15:13 UTC ( [id://977667]=perlquestion: print w/replies, xml ) Need Help??

jabowery has asked for the wisdom of the Perl Monks concerning the following question:

DBD::mysql is transforming valid utf8 into gibberish(?) on the way through mysql.so:
746572e280a62ee280... becomes 746572c3a2c280c2a6...

UPDATE

mysql.so is doing a gratuitous utf8::encode($s) on strings that have the utf8::is_utf8($s) bit set. I was able to compensate for this by performing this just prior to the call to mysql's execute:

    for(keys %$row){utf8::decode($row->{$_}) if utf8::is_utf8($row->{$_})}

END UPDATE

Neither use open ':utf8' nor use open ':encoding(UTF-8)'; changed the bogus behavior.

A table, the dump for which starts:

CREATE TABLE `host_MyApp_DUFs` ( `DUF_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `member_id` bigint(20) unsigned NOT NULL COMMENT '', `install_id` tinyint(1) unsigned NOT NULL COMMENT '', `job_id` int(12) unsigned NOT NULL DEFAULT '0' COMMENT '', `content` longtext COLLATE utf8_unicode_ci COMMENT '', `token_id` bigint(20) unsigned DEFAULT NULL COMMENT 'links to `host_ +MyApp_tokens`', PRIMARY KEY (`member_id`,`install_id`,`job_id`,`DUF_id`), UNIQUE KEY `token_id` (`token_id`), KEY `DUF_id` (`DUF_id`) ) ENGINE=InnoDB AUTO_INCREMENT=406 DEFAULT CHARSET=utf8 COLLATE=utf8_u +nicode_ci COMMENT=''; /*!40101 SET character_set_client = @saved_cs_client */;

is being inserted (field 'content') with a valid UTF8 string (as verified with Test::utf8's "is_sane_utf8" and "is_flagged_utf8") as the bind input to the "execute" method.

The connection was opened with:

my $dbix = DBIx::Lite-> connect( "dbi:mysql:dbname=$ENV{DB_NAME}", $ENV{DB_USER}, $ENV{DB_PASSWORD}, { mysql_enable_utf8 => 1 } );

And strace verifies the "SET NAMES utf8" command is traversing the socket from the client to the server. Moreover the query:

show VARIABLES LIKE 'character_set%';

results in:

DB<6> x $st->fetchrow 0 'character_set_client' 1 'utf8' + + DB<7> x $st->fetchrow 0 'character_set_connection' 1 'utf8' + + DB<8> x $st->fetchrow 0 'character_set_database' 1 'utf8' + + DB<9> x $st->fetchrow 0 'character_set_filesystem' 1 'utf8' + + DB<10> x $st->fetchrow 0 'character_set_results' 1 'utf8' + + DB<11> x $st->fetchrow 0 'character_set_server' 1 'utf8' + + DB<12> x $st->fetchrow 0 'character_set_system' 1 'utf8' + + DB<13> x $st->fetchrow 0 'character_sets_dir' 1 '/usr/share/mysql/charsets/'

However, strace of the data going from the client to the server shows an octet string that has everything intact (ie: 'INSERT INTO.... regular ascii data for content, etc') except the multi-octet utf8 characters. They've been transformed. An example is:

746572e280a62ee280... becomes 746572c3a2c280c2a6...
Is it time to go to uuencode or carrier pigeon with the elder futhark or something?

Replies are listed 'Best First'.
Re: UTF8 to Mysql transformed by mysql.so?
by zentara (Archbishop) on Jun 21, 2012 at 16:12 UTC
      Thanks but no. The above example demonstrates conformance to the Perl -> MySQL UTF-8 pattern described in that document does not work in this case. Something else is going on.

        You don't show the code you're using to read the data and write it to the DB. You also don't show the MySQL versions (client library, server version, DBD::mysql) involved. There is a bug report for DBD::mysql that includes code to replicate the problem. Maybe that bug is present in your environment as well?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://977667]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2024-04-19 01:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found