Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Problems? Is your data what you think it is?
 
PerlMonks  

DBI, unicode, ms sql server

by zod (Scribe)
on Feb 08, 2009 at 02:01 UTC ( #742191=perlquestion: print w/ replies, xml ) Need Help??
zod has asked for the wisdom of the Perl Monks concerning the following question:

O Monks,

When inserting unicode text (chinese in my case) into MySQL, I set up the connection like so:

my $dbh = DBI->connect( 'DBI:mysql:db_main', 'user', 'pass', { RaiseError => 1, AutoCommit => 1, mysql_enable_utf8 => 1, on_connect_do => [ "SET NAMES 'utf8'", "SET CHARACTER SET +'utf8'" ] } )
However, I haven't been able to replicate this on MS SQL Server 2008.

My connection looks like this:

my $dbh = DBI->connect('DBI:ODBC:Driver={SQL Server};Server=SQL2008;Da +tabase=db_main;UID=user;PWD=pass') or die "$DBI::errstr\n";
That allows me to connect and select from tables, etc., but when I try to insert Chinese text, it inserts garbage. So, how do you specify utf-8 for the ODBC driver? Any ideas?

Thanks

UPDATE: I resolved this. I'll spare you the gory details involved with my particular environment (chinese version of windows on one box, not another, collations, etc.), but the key turns out to be this from the SQL Server BOL:

"When referencing the Unicode character data types nchar, nvarchar, and ntext, 'expression' should be prefixed with the capital letter 'N'. If 'N' is not specified, SQL Server converts the string to the code page that corresponds to the default collation of the database or column. Any characters not found in this code page are lost."

I did not have the "N" on my insert statement in perl. So, this worked with the 'N':

my $sql = "insert test values (N'中文')";

Comment on DBI, unicode, ms sql server
Select or Download Code
Re: DBI, unicode, ms sql server
by Anonymous Monk on Feb 08, 2009 at 03:14 UTC
      Yeah, I read that already. The gist of it seems to be that unicode is enabled by default on Windows with the latest version, but that utf-8 will be converted to utf-16.

      i.e. : "perl scalars which are UTF-8 and are sent through the ODBC API will be converted to UTF-16 and passed to the ODBC wide APIs or signalled as SQL_WCHARs (e.g., in the case of bound columns)."

      I didn't see anything in the docs about how to actually get around that, however.

        How do you verify it inserts garbage? MS SQL Server 2008 doesn't support utf8, only utf16, so either DBD::ODBC does the conversion to/from incorrectly, or something else :D

        Don't inline Unicode values (in fact, don't inline ANY values) in your SQL. Always use placeholders, i.e. my $sth=$dbh->prepare("insert into XXX values (?,?)"; $sth->execute("\x{2345}\x{3333}","\x{4711}"); $sth->finish(); or $dbh->do("insert into XXX values (?,?)",undef,"\x{2345}\x{3333}","\x{4711}");. Inline values prevent SQL caching, inline values open security holes (SQL injection), inline values break Unicode.

        See also the second paragraph under "Problems" in DBD-ODBC-1.18/README.af

        Alexander Foken

Re: DBI, unicode, ms sql server
by graff (Chancellor) on Feb 08, 2009 at 04:01 UTC
    Have you ever used any alternative method of inserting data in to this database, such that you were able to confirm that the inserted data was not garbage?

    If so, this means that you also have a means for pulling data out of this database in such a way that it can be displayed correctly as Chinese text (because otherwise, you wouldn't know whether or not the data was garbage).

    And if all that is true, then it could be instructive to use your successful method of reading successful inserts, and get a hex dump of some successful output.

    Then use that same method to read the garbage, again with a hex dump. This is likely to make it clear how the data are being transformed differently in the two cases, and might also make it clear what sort of pre-conditioning in your perl script (before sending the data to the server) would fix the problem. (There are several ways that things can go wrong, and it can be hard to tell them apart without a hex dump of both good and bad data.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://742191]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (7)
As of 2014-04-21 13:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (495 votes), past polls