Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

[SOLVED] same utf8 string is different in console and in browser (Sybase)

by alexander_lunev (Pilgrim)
on Aug 19, 2017 at 14:45 UTC ( [id://1197659]=perlquestion: print w/replies, xml ) Need Help??

alexander_lunev has asked for the wisdom of the Perl Monks concerning the following question:

Hello, monks! I'm seeking for your wisdom!

I'm getting strings from various SQL servers via DBI, and those are suppose to be utf8 russian strings. Getting utf8 strings from PgSQL is OK, but with Sybase strings in console looks ok, but when printed to browser via CGI, they're turns into ??????.

The core of the program is this:

my $dbh = DBI->connect($db{$dsn}{dsn},$db{$dsn}{user},$db{$dsn}{pa +ssword},$db{$dsn}{opts}); if (!defined($dbh)) { print "Error creating dbh: " . $DBI::errstr . "\n"; exit; } my $sth; $sth = $dbh->prepare($query); if (!$sth) { print "Error: " . $dbh->errstr . "\n"; exit; } if (!$sth->execute) { print "Error: " . $sth->errstr . "\n"; exit; } print "Content-Type: text/html; charset=utf-8\n\n"; my $ref = $sth->fetchrow_arrayref; my $str = $$ref[0]; print $db{$dsn}{driver}." ".$str ." > ".join(" ",map {sprintf("0x% +X",$_)} unpack("C*",$str))."\n";

In console all strings and bytes are the same:

# ./sql_test Content-Type: text/html; charset=utf-8 Sybase школы#Кас&#1089 +;а > 0xD1 0x88 0xD0 0xBA 0xD0 0xBE 0xD0 0xBB 0xD1 0x8B 0x23 0xD +0 0x9A 0xD0 0xB0 0xD1 0x81 0xD1 0x81 0xD0 0xB0
# ./sql_test Content-Type: text/html; charset=utf-8 Pg школы#Касс&#1 +072; > 0xD1 0x88 0xD0 0xBA 0xD0 0xBE 0xD0 0xBB 0xD1 0x8B 0x23 0xD0 0x +9A 0xD0 0xB0 0xD1 0x81 0xD1 0x81 0xD0 0xB0

But in browser:

Sybase ?????#????? > 0x3F 0x3F 0x3F 0x3F 0x3F 0x23 0x3F 0x3F 0x3F 0x3F + 0x3F
Pg школы#Касс&#1 +072; > 0xD1 0x88 0xD0 0xBA 0xD0 0xBE 0xD0 0xBB 0xD1 0x8B 0x23 0xD0 0x +9A 0xD0 0xB0 0xD1 0x81 0xD1 0x81 0xD0 0xB0

So, it's not just browser glitch with encoding, but the very $str is changed!

WHY?

UPD: Solution is here 1197669.

Replies are listed 'Best First'.
Re: same utf8 string is different in console and in browser (Sybase)
by choroba (Cardinal) on Aug 19, 2017 at 17:46 UTC
    This can happen when one of the strings is utf-8, while the other one is bytes.

    Compare:

    #! /usr/bin/perl
    use warnings;
    use strict;
    use feature qw{ say };
    use utf8;
     
    my @strings = ('школы',
                   "\xd1\x88\xd0\xba\xd0\xbe\xd0\xbb\xd1\x8b");
     
    say for @strings;
     
    binmode STDOUT, ':encoding(UTF-8)';
    say for @strings;
    

    The first two lines are the same:

    школы
    школы
    

    but once the output knows it expects UTF-8 (the third and fourth line), the output is different:

    школы
    ΡΠΊΠΎΠ»Ρ
    

    You probably didn't tell Sybase your strings are UTF-8.

    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,

      But you were right! I've changed way to connect to MSSQL, and when i connect through freetds.conf with DBI:Sybase:server=server_name;database=database_name", strings output to browser as they should! I have tried with syb_enable_utf8 => 1, tried to put charset=utf8 in DSN, with no changes. Now i add server to freetds.conf and add client charset = UTF-8 and it works.

      Though i still don't get it, why the very bytes of $str turns different only because of script called by CGI...

        So, some hidden thing there. Different environment settings for command line and web server?

        perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'
      I think it's not the case, because if i add binmode STDOUT, ':encoding(UTF-8)', output in console is the same for Pg and Sybase:
      Sybase Ρ�ΠΊΠΎΠ»Ρ�#Π�Π°Ρ�Ρ�Π° > 0xD1 + 0x88 0xD0 0xBA 0xD0 0xBE 0xD0 0xBB 0xD1 0x8B 0x23 0xD0 0x9A 0xD0 0xB +0 0xD1 0x81 0xD1 0x81 0xD0 0xB0 Pg Ρ�ΠΊΠΎΠ»Ρ�#Π�Π°Ρ�Ρ�Π° > 0xD1 0x8 +8 0xD0 0xBA 0xD0 0xBE 0xD0 0xBB 0xD1 0x8B 0x23 0xD0 0x9A 0xD0 0xB0 0x +D1 0x81 0xD1 0x81 0xD0 0xB0

      But in browser again all different:

      Sybase ?????#????? > 0x3F 0x3F 0x3F 0x3F 0x3F 0x23 0x3F 0x3F 0x3F 0x3F + 0x3F Pg ΡˆΠΊΠΎΠ»Ρ‹#ΠšΠ°ΡΡΠ° > 0xD1 0x88 0xD0 0xBA 0xD0 0xB +E 0xD0 0xBB 0xD1 0x8B 0x23 0xD0 0x9A 0xD0 0xB0 0xD1 0x81 0xD1 0x81 0x +D0 0xB0
Re: same utf8 string is different in console and in browser (Sybase)
by shmem (Chancellor) on Aug 19, 2017 at 17:56 UTC

    What web server are you using? Is it configured to deliver UTF-8? Try adding the line

    <meta charset="UTF-8" />

    to the <head> block of the generated html. Also, it should be Content-Type: text/html; charset=UTF-8 (UTF-8 uppercase).

    perl -le'print map{pack c,($-++?1:13)+ord}split//,ESEL'

      Server is nginx + fastcgi, it is configured for utf-8, browser sees page as UTF-8. And actually all other scripts works fine with utf-8, problem is only with strings that come from Sybase. Solution is here 1197669.

Re: same utf8 string is different in console and in browser (Sybase)
by poj (Abbot) on Aug 19, 2017 at 18:10 UTC

    Try this test script with browser and command line

    #!/usr/bin/perl use strict; use warnings; use CGI ':standard'; my @lines = qx(locale); # html print header(),start_html('Locale Test'); print pre(@lines); print end_html();
    poj

      In browser it's:

      LANG= LC_CTYPE="C" LC_COLLATE="C" LC_TIME="C" LC_NUMERIC="C" LC_MONETARY="C" LC_MESSAGES="C" LC_ALL=

      In console:

      LANG=ru_RU.UTF-8 LC_CTYPE="ru_RU.UTF-8" LC_COLLATE="ru_RU.UTF-8" LC_TIME="ru_RU.UTF-8" LC_NUMERIC="ru_RU.UTF-8" LC_MONETARY="ru_RU.UTF-8" LC_MESSAGES="ru_RU.UTF-8" LC_ALL=

      Problem is solved here 1197669, although i don't understand why strings are different in console and CGI.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1197659]
Front-paged by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-04-18 11:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found