http://www.perlmonks.org?node_id=1094072

nihiliath has asked for the wisdom of the Perl Monks concerning the following question:

Hello, Monks

I have this problem:

I have a script fetching some data from a Postgresql database.

The database server encoding is UTF-8, and the script is in cp1251 (windows-1251). After connecting to the database (using DBI) I set the database client encoding to WIN1251, so to tell the database that I want the output to be in this encoding. Then I execute the SQL select query and I fetch the data in a hash reference. When I try to dump the data into the console(or just use it in any other way) I get this warning / error:

Malformed UTF-8 character (unexpected non-continuation byte 0x2e, immediately after start byte 0xf2) in subroutine entry at /usr/local/lib/perl5/5.16/mach/Data/Dumper.pm line 205.

The reason that we use windows-1251 as code encoding is because it's legacy code and we have data in cyrrilic. The database is in UTF8 because we hope that we'll rewrite the code in the future in UTF8...

Can someone tell me why this is happening and how to avoid it?!

I'm using:

FreeBSD 9.2-STABLE #2 r265059 Perl v5.16.3 Postgresql 9.1.13 DBI v1.631 DBD::Pg v3.0.0

My locale settings:

LANG=bg_BG.CP1251 LC_CTYPE="bg_BG.CP1251" LC_COLLATE="bg_BG.CP1251" LC_TIME="bg_BG.CP1251" LC_NUMERIC=C LC_MONETARY="bg_BG.CP1251" LC_MESSAGES="bg_BG.CP1251" LC_ALL=

My database locale:

Name | Owner | Encoding | Collate | Ctype | ----------------+---------+----------+--------------+--------------+ database_name | yavor | UTF8 | C | C |

My script:

#!/usr/bin/perl use warnings; use strict; use Data::Dumper; use DBI; #connection settings my $s_db_name = 'database_name'; my $s_db_user = 'user'; my $s_db_auth = 'pass'; #hash ref for data fetching my $href_row = {}; #connection with transaction my $oref_dbh = DBI->connect('dbi:Pg:dbname=' . $s_db_name, $s_db_user, + $s_db_auth, {AutoCommit => 0, RaiseError => 1}); #getting the default client encoding my $oref_sth = $oref_dbh->prepare("SHOW client_encoding;"); $oref_sth->execute; $href_row = $oref_sth->fetchrow_hashref(); print "\n Default Client Encoding: " . Dumper($href_row); #UTF8 #setting client encoding to be windows-1251 $oref_sth = $oref_dbh->prepare("SET client_encoding = 'WIN1251';"); $oref_sth->execute; #getting the current client encoding $oref_sth = $oref_dbh->prepare("SHOW client_encoding;"); $oref_sth->execute; $href_row = $oref_sth->fetchrow_hashref(); print "\n Current Client Encoding: " . Dumper($href_row); #WIN1251 #preparing the SELECT query $oref_sth = $oref_dbh->prepare(" SELECT id, name_en, host, descr_bg, descr_en FROM application;"); #executing it $oref_sth->execute; #fetching the data $href_row = $oref_sth->fetchall_hashref('id'); #rollback the current transaction $oref_dbh->rollback(); #dumping the fetched data using Data::Dumper print "\n Data: " . Dumper($href_row);

Result from execution:

Default Client Encoding: $VAR1 = { 'client_encoding' => 'UTF8' }; Current Client Encoding: $VAR1 = { 'client_encoding' => 'WIN1251' }; Malformed UTF-8 character (unexpected non-continuation byte 0xee, imme +diately after start byte 0xd1) in subroutine entry at /usr/local/lib/ +perl5/5.16/mach/Data/Dumper.pm line 205. ... Malformed UTF-8 character (1 byte, need 4, after start byte 0xf2) in s +ubroutine entry at /usr/local/lib/perl5/5.16/mach/Data/Dumper.pm line + 205. ... Data: $VAR1 = { '<proper ouput>' => { 'descr_en' => '', 'descr_bg' => '<proper output in the console in cyrri +llic>', 'id' => <proper output>, 'name_en' => '<proper output>', 'host' => '<proper output>' },