http://www.perlmonks.org?node_id=847079

Wolfgang has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am lost and need new ideas, please!
I have a DB, mysql, 5.x
I can read utf8 (Umlaute like ä ö ü or cyrillic like йцуке)
I can write Umlaute etc. without problem
I can look into the db and check the utf-codes there
I can not write cyrillic, each character is replaced by 0x3f
my $names=$dbh->do("SET NAMES utf8"); my $charset=$dbh->do("SET CHARACTER SET utf8"); my $do_sql=$dbh->do($sql);

The problem is (probably) not in the db, because I can write cyrillic with phpmyAdmin (and read back with my script)). Pasting this sql-statement into my script does not work, though. I am definitily passing utf-encoded string to 'do'. DBI::data_string_desc says: 'UTF8 on, non-ASCII, 99 characters 105 bytes'

What kind of silly mistake could I be doing?

Wolfgang

Replies are listed 'Best First'.
Re: Can't write cyrillic to DBD::mysql
by Corion (Patriarch) on Jun 29, 2010 at 10:24 UTC

    Maybe your input data is not in UTF8 but in some other charset? Maybe you're using/sending Latin-1, which allows Umlaute?

      Definitely utf8, I checked the code with Devel::Peek, forwarded it in an email and put it back into an utf8 coded web page.

        Checking with Devel::Peek will only tell you whether Perl thinks that the string is encoded as utf8. I would print the output and then pipe it through hexdump or od -x to really see what octets are output.

        But if the data really is utf8, then the culprit seems to be DBD::mysql and how it handles/accepts utf8. Are you using placeholders?

Re: Can't write cyrillic to DBD::mysql
by Xilman (Hermit) on Jun 29, 2010 at 10:28 UTC

    I'm not sure what is going wrong, and my experience with a DB holding non-ASCII characters is with Postgres and with Greek characters. Accordingly I can't give you answers but can recommend a few things which may help you with your debugging.

    What is the value of $sql when you execute the third command?

    Are you absolutely certain it contains UTF8 encoded Cyrillic characters?

    If you read a Cyrillic string from your DB and then write it back to another location, does it get corrupted?

    Is it only Cyrillic which is giving problems, or do other characters outside code page zero? (My guess is that they won't work either; try experimenting with Greek, Korean, or the like.)

    Good luck!

    Paul

      Hi Paul,

      Value of $sql seems fine (Below outputted by Devel::Peek).

      Test with Greek shows the same problem. String for Adresse2 is 'äö alpha beta'

      SV = PV(0xa494174) at 0x4df06fc REFCNT = 2 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x9ea2894 "UPDATE users SET Adresse2='\303\244\303\266 \316\261 +\316\262' WHERE CONVERT( `users`.`Username` USING utf8 ) ='t est' LIMIT 1;"\0 [UTF8 "UPDATE users SET Adresse2='\x{e4}\x{f6} \x{3b1 +}\x{3b2}' WHERE CONVERT( `users`.`Username` USING utf8 ) ='t est' LIMIT 1;"] CUR = 101 LEN = 104

      But what does that tell us?

      Still confused
      Wolfgang
Re: Can't write cyrillic to DBD::mysql
by Krambambuli (Curate) on Jun 29, 2010 at 17:43 UTC
    After reading Corion's answer to your question, I checked DBD::mysql's doc; and there is a paragraph that says:

           mysql_enable_utf8
               This attribute determines whether DBD::mysql should assume strings stored in the
               database are utf8.  This feature defaults to off.
    
    
    Maybe turning that one on would help.
Re: Can't write cyrillic to DBD::mysql
by Wolfgang (Novice) on Jun 29, 2010 at 19:09 UTC
    Problem solved (or rather worked around) like this:

    According to documentation of mysql it should be possible to use the connection parameters to override some server defaults. That did not work for me.
    Eventually I had to:
    - change default charset of database to utf
    - change default charset of table to utf
    - change collation of column to utf
    Than it worked out the way it was supposed to.