Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

DBI, unicode, ms sql server

by zod (Scribe)
on Feb 08, 2009 at 02:01 UTC ( #742191=perlquestion: print w/replies, xml ) Need Help??
zod has asked for the wisdom of the Perl Monks concerning the following question:

O Monks,

When inserting unicode text (chinese in my case) into MySQL, I set up the connection like so:

my $dbh = DBI->connect( 'DBI:mysql:db_main', 'user', 'pass', { RaiseError => 1, AutoCommit => 1, mysql_enable_utf8 => 1, on_connect_do => [ "SET NAMES 'utf8'", "SET CHARACTER SET +'utf8'" ] } )
However, I haven't been able to replicate this on MS SQL Server 2008.

My connection looks like this:

my $dbh = DBI->connect('DBI:ODBC:Driver={SQL Server};Server=SQL2008;Da +tabase=db_main;UID=user;PWD=pass') or die "$DBI::errstr\n";
That allows me to connect and select from tables, etc., but when I try to insert Chinese text, it inserts garbage. So, how do you specify utf-8 for the ODBC driver? Any ideas?


UPDATE: I resolved this. I'll spare you the gory details involved with my particular environment (chinese version of windows on one box, not another, collations, etc.), but the key turns out to be this from the SQL Server BOL:

"When referencing the Unicode character data types nchar, nvarchar, and ntext, 'expression' should be prefixed with the capital letter 'N'. If 'N' is not specified, SQL Server converts the string to the code page that corresponds to the default collation of the database or column. Any characters not found in this code page are lost."

I did not have the "N" on my insert statement in perl. So, this worked with the 'N':

my $sql = "insert test values (N'中文')";

Replies are listed 'Best First'.
Re: DBI, unicode, ms sql server
by graff (Chancellor) on Feb 08, 2009 at 04:01 UTC
    Have you ever used any alternative method of inserting data in to this database, such that you were able to confirm that the inserted data was not garbage?

    If so, this means that you also have a means for pulling data out of this database in such a way that it can be displayed correctly as Chinese text (because otherwise, you wouldn't know whether or not the data was garbage).

    And if all that is true, then it could be instructive to use your successful method of reading successful inserts, and get a hex dump of some successful output.

    Then use that same method to read the garbage, again with a hex dump. This is likely to make it clear how the data are being transformed differently in the two cases, and might also make it clear what sort of pre-conditioning in your perl script (before sending the data to the server) would fix the problem. (There are several ways that things can go wrong, and it can be hard to tell them apart without a hex dump of both good and bad data.)

Re: DBI, unicode, ms sql server
by Anonymous Monk on Feb 08, 2009 at 03:14 UTC
      Yeah, I read that already. The gist of it seems to be that unicode is enabled by default on Windows with the latest version, but that utf-8 will be converted to utf-16.

      i.e. : "perl scalars which are UTF-8 and are sent through the ODBC API will be converted to UTF-16 and passed to the ODBC wide APIs or signalled as SQL_WCHARs (e.g., in the case of bound columns)."

      I didn't see anything in the docs about how to actually get around that, however.

        Don't inline Unicode values (in fact, don't inline ANY values) in your SQL. Always use placeholders, i.e. my $sth=$dbh->prepare("insert into XXX values (?,?)"; $sth->execute("\x{2345}\x{3333}","\x{4711}"); $sth->finish(); or $dbh->do("insert into XXX values (?,?)",undef,"\x{2345}\x{3333}","\x{4711}");. Inline values prevent SQL caching, inline values open security holes (SQL injection), inline values break Unicode.

        See also the second paragraph under "Problems" in DBD-ODBC-1.18/

        Alexander Foken

        How do you verify it inserts garbage? MS SQL Server 2008 doesn't support utf8, only utf16, so either DBD::ODBC does the conversion to/from incorrectly, or something else :D

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://742191]
Approved by ww
shmem missed his monkday yesterday :P - 12 years and way to go...
[1nickt]: Happy Monkday shmem. I was at my stepson's birthday party and thought of you.
[virtualsue]: holli about half
[choroba]: "Tatsächlich, das ist die Knochenstruktur eines Bettlers!" ??
shmem bows. Thank you
[marto]: slowclap.gif
[marto]: animated gif in question...
[1nickt]: Interesting article on the Coming Software Apocaplyse that I found while following one of eyepopslikeamosqui to's posts. Posits that no one can guarantee the quality of, or even understand, the 100 million lines of code in a mo
[holli]: you know, monk days are fun and all. but too many of them are bad for your health
[1nickt]: ...modern car. Suggests using TLA+ or other software modeling language to autogenerate code.

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (12)
As of 2017-11-21 12:56 GMT
Find Nodes?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:

    Results (301 votes). Check out past polls.