Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?

I'm having a lot of trouble getting UTF-8 output via Perl/DBI on OS X

by Cody Fendant (Friar)
on Sep 22, 2012 at 05:53 UTC ( #995048=perlquestion: print w/replies, xml ) Need Help??
Cody Fendant has asked for the wisdom of the Perl Monks concerning the following question:


  • Perl v5.10.1 (*) built for darwin-2level
  • DBI v 1.622
  • MySQL Server v 5.1.42
  • OS X Terminal App
  • GNU bash, version 4.0.35(2)-release (i386-apple-darwin10.8.0)
  • BBEdit 9.6.3

MySQL table:

CREATE TABLE `quotes` ( `text` varchar(255) NOT NULL ) ENGINE=MyISAM DEFAULT CHARSET=utf8

Insert one row with some smart quote characters in it

INSERT INTO `test`.`quotes` ( `text` ) VALUES ( '“” ‘’' );

Command-line MySQL to check the characters got inserted correctly (having connected with --default-character-set=utf8 flag):

mysql> select * from quotes where 1; +----------------+ | text | +----------------+ | “” ‘’ | +----------------+ mysql> select hex(text) from quotes where 1; +----------------------------------+ | hex(text) | +----------------------------------+ | E2809C E2809D 2020 E28098 E28099 | +----------------------------------+ ## just so we know exactly what bytes are in there, my spacing

Minimal perl script to get that row:

#!/usr/local/bin/perl use utf8; use strict; use DBI; my $dbh = DBI->connect( 'DBI:mysql:test:localhost', 'root', 'xxxxxxxx' ) or die("couldn't connect to database"); my $text_display = $dbh->prepare('select * from quotes where 1'); $text_display->execute(); my ($text) = $text_display->fetchrow_array(); print "Text: $text" . $/;

I have my OS X Terminal set to UTF-8 encoding. I have a UTF-8 aware version of bash (I can create files with those quotes in the name and successfully ls and display them.

But when I run the script above from the terminal, it outputs this:

Text: ?? ??

And when I run it via BBEdit's very handy Run command for Unix scripts, I get this:

Text: ìî ëí

Any assistance you can give me will be very gratefully received. Something is obviously going wrong between the database and the output hitting the screen, but everything in the chain is supposed to be UTF-8 aware.

What should I try?

Replies are listed 'Best First'.
Re: I'm having a lot of trouble getting UTF-8 output via Perl/DBI on OS X
by remiah (Hermit) on Sep 22, 2012 at 06:48 UTC


    Redirect it's output to file and see it with Browser. And set encoding of your browser to UTF-8. And What do you see?

    I see nothing wrong with your perl script, get UTF-8 encoded bytes from db and print encoded bytes...What is this?


    Sorry. maybe I was looking for wrong direction.
    If you do 'set Names', how does it look like?

    $dbh->do("SET NAMES 'utf8'");

      $dbh->do("SET NAMES 'utf8'");

      That works! Thank you!

      So, what's going on? It's actually MySQL auto-converting from UTF-8 when I never asked it to?

        Cody Fendant:

        I just looked at the DBD::mysql documentation, and there's a setting mysql_enable_utf8 that defaults to off. I've not used DBD::mysql, but based on the docs, it looks like you can just add the setting in the connect statement like this:

        $dbh= DBI->connect("DBI:mysql:test;mysql_enable_utf8=1", "root", "");

        ++ for the OP: clear, detailed and didn't seem to miss any pertinent information.


        When your only tool is a hammer, all problems look like your thumb.

        Hello, Cody Fendant,roboticus.

        PostgreSQL or MySQL seems to have encoding for server storage, and for client encoding. So, I guess 'select hex(field) from table' shows bytes for server storage and when you receive the value at the client, it is converted to encoding of client.

        "mysql_enable_utf8=1" and "SET NAMES 'utf8'" sets encoding for client. I saw both of them gives me good result with DBD::mysql version "4.008".

        In fact, I am not the man of MySQL... Please point me if I say wrong.

        P.S: With my DBD::mysql version, the result is utf-8 bytes, not Perl's character. utf8::is_utf8 will tell you whether it is decoded character or bytes.

      Good idea. Outputting to a file gives me quotes which look fine in Latin (ISO-8859-1) and question marks in UTF-8. The bytes are 93 94 20 20 91 92.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://995048]
Approved by kcott
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2017-08-22 17:34 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (337 votes). Check out past polls.