Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Syntactic Confectionery Delight
 
PerlMonks  

Re^2: Perl DBI can't display Chinese text?

by Thai Heng (Sexton)
on Oct 15, 2013 at 01:25 UTC ( #1058221=note: print w/ replies, xml ) Need Help??


in reply to Re: Perl DBI can't display Chinese text?
in thread Perl DBI can't display Chinese text?

thanks! The article help me solve this problem as follows:

while(my @line = $sth->fetchrow_array()){ my ($name,$deptcode) = @line; $name = encode("MacChineseSimp", $name); print "one line is ------$name,\t$deptcode\n"; }


Comment on Re^2: Perl DBI can't display Chinese text?
Download Code
Re^3: Perl DBI can't display Chinese text?
by mje (Deacon) on Oct 16, 2013 at 08:17 UTC

    You shouldn't have to do that call to encode with DBD::ODBC. Your trace shows the column was retrieved as SQL_WCHARs and DBD::ODBC should have encoded the data correctly for you (believe me, I maintain DBD::ODBC). What versions of DBI and DBD::ODBC are you using?

    Could you provide a small simple example which creates a table like yours, inserts the chinese data and reads it back and I will take a look at it.

Re^3: Perl DBI can't display Chinese text?
by mje (Deacon) on Oct 18, 2013 at 16:24 UTC

    I've now had a chance to look at this more as I was a bit surprised you had to do this.

    The main issue is you are using verchar columns and not nvarchar columns and you did not tell us how the data got into the database in the first place. Let's assume the data was inserted into the database with DBD::ODBC in the first place and see what happens

    use strict; use warnings; use DBI qw(:utils :sql_types); use 5.016; use Data::Dumper; use Encode; my $h = DBI->connect() or die $DBI::errstr; $h->{RaiseError} = 1; eval { $h->do(q/drop table chinese/); }; $h->do(q/create table chinese (n varchar(30), d varchar(12))/); my $str = "\x{20020}\x{24202}\x{31185}\x{23460}"; my $encoded = encode('UTF-8', $str); foreach my $s (split(//, $encoded)) { print sprintf("%x,", ord($s)); } say ""; say data_string_desc($str); my $s = $h->prepare(q/insert into chinese values(?,?)/); $s->execute($str, "111"); $s = $h->prepare(q/select cast(n as varbinary) from chinese/); $s->execute; my $r = $s->fetchall_arrayref; foreach my $s (split(//, $r->[0][0])) { print sprintf("%x,", ord($s)); } say ""; say data_string_desc($r->[0][0]); my $d = decode('UTF-8', $r->[0][0]); say data_string_desc($d); print Dumper(\$d);

    which outputs:

    f0,a0,80,a0,f0,a4,88,82,f0,b1,86,85,f0,a3,91,a0, UTF8 on, non-ASCII, 4 characters 16 bytes f0,a0,80,a0,f0,a4,88,82,f0,b1,86,85,f0,a3,91,a0, UTF8 off, non-ASCII, 16 characters 16 bytes UTF8 on, non-ASCII, 4 characters 16 bytes $VAR1 = \"\x{20020}\x{24202}\x{31185}\x{23460}";

    When I look under the hood at what ODBC APIs DBD::ODBC is calling I see DBD::ODBC took the 4 chinese characters that are UTF-8 encoded in Perl (see first lines of output) and it converted them to UTF16 and bound them as SQL_WCHARs. Note, varchar columns are not really designed to store unicode data. If we convert what SQL Server thinks it has in the column to varbinary and read it back we miracoulously get the chinese string UTF-8 encoded back (the 3rd/4th lines of output). Unsurprisingly we can decode that UTF-8 back into a string in Perl (the last 2 lines of output). In other words, SQL Server does not know that is UTF-8 encoded data and length functions and collation will not work.

    If we rewrite the select code to a simple "select * from chinese" we seem to get back rubbish. What has happened here is that DBD::ODBC bound what it believed to be char data as SQL_WCHARs, SQL Server passed back UTF16 encoded representation of the 16 bytes and DBD::ODBC decoded it to UTF-8 and hence it is now double encoded UTF-8.

    If we had changed the code above to bind the input data as SQL_WCHARs SQL Server would have just put a load of question marks in the column as it cannot do what you want.

    The correct way to do this with SQL Server is to make the column nvarchar and then everything will just work (except collations when characters don't fit into UCS-2 until SQL Server 2012 - see Inserting unicode characters > 0xFFFF (surrogate pairs) into MS SQL Server with Perl DBD::ODBC for why).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1058221]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2014-04-21 11:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (493 votes), past polls