Hello, Monks
I have this problem:
I have a script fetching some data from a Postgresql database.
The database server encoding is UTF-8, and the script is in cp1251 (windows-1251). After connecting to the database (using DBI) I set the database client encoding to WIN1251, so to tell the database that I want the output to be in this encoding. Then I execute the SQL select query and I fetch the data in a hash reference. When I try to dump the data into the console(or just use it in any other way) I get this warning / error:
Malformed UTF-8 character (unexpected non-continuation byte 0x2e, immediately after start byte 0xf2) in subroutine entry at /usr/local/lib/perl5/5.16/mach/Data/Dumper.pm line 205.
The reason that we use windows-1251 as code encoding is because it's legacy code and we have data in cyrrilic. The database is in UTF8 because we hope that we'll rewrite the code in the future in UTF8...
Can someone tell me why this is happening and how to avoid it?!
I'm using:
FreeBSD 9.2-STABLE #2 r265059
Perl v5.16.3
Postgresql 9.1.13
DBI v1.631
DBD::Pg v3.0.0
My locale settings:
LANG=bg_BG.CP1251
LC_CTYPE="bg_BG.CP1251"
LC_COLLATE="bg_BG.CP1251"
LC_TIME="bg_BG.CP1251"
LC_NUMERIC=C
LC_MONETARY="bg_BG.CP1251"
LC_MESSAGES="bg_BG.CP1251"
LC_ALL=
My database locale:
Name | Owner | Encoding | Collate | Ctype |
----------------+---------+----------+--------------+--------------+
database_name | yavor | UTF8 | C | C |
My script:
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
use DBI;
#connection settings
my $s_db_name = 'database_name';
my $s_db_user = 'user';
my $s_db_auth = 'pass';
#hash ref for data fetching
my $href_row = {};
#connection with transaction
my $oref_dbh = DBI->connect('dbi:Pg:dbname=' . $s_db_name, $s_db_user,
+ $s_db_auth, {AutoCommit => 0, RaiseError => 1});
#getting the default client encoding
my $oref_sth = $oref_dbh->prepare("SHOW client_encoding;");
$oref_sth->execute;
$href_row = $oref_sth->fetchrow_hashref();
print "\n Default Client Encoding: " . Dumper($href_row); #UTF8
#setting client encoding to be windows-1251
$oref_sth = $oref_dbh->prepare("SET client_encoding = 'WIN1251';");
$oref_sth->execute;
#getting the current client encoding
$oref_sth = $oref_dbh->prepare("SHOW client_encoding;");
$oref_sth->execute;
$href_row = $oref_sth->fetchrow_hashref();
print "\n Current Client Encoding: " . Dumper($href_row); #WIN1251
#preparing the SELECT query
$oref_sth = $oref_dbh->prepare("
SELECT
id,
name_en,
host,
descr_bg,
descr_en
FROM
application;");
#executing it
$oref_sth->execute;
#fetching the data
$href_row = $oref_sth->fetchall_hashref('id');
#rollback the current transaction
$oref_dbh->rollback();
#dumping the fetched data using Data::Dumper
print "\n Data: " . Dumper($href_row);
Result from execution:
Default Client Encoding: $VAR1 = {
'client_encoding' => 'UTF8'
};
Current Client Encoding: $VAR1 = {
'client_encoding' => 'WIN1251'
};
Malformed UTF-8 character (unexpected non-continuation byte 0xee, imme
+diately after start byte 0xd1) in subroutine entry at /usr/local/lib/
+perl5/5.16/mach/Data/Dumper.pm line 205.
...
Malformed UTF-8 character (1 byte, need 4, after start byte 0xf2) in s
+ubroutine entry at /usr/local/lib/perl5/5.16/mach/Data/Dumper.pm line
+ 205.
...
Data: $VAR1 = {
'<proper ouput>' => {
'descr_en' => '',
'descr_bg' => '<proper output in the console in cyrri
+llic>',
'id' => <proper output>,
'name_en' => '<proper output>',
'host' => '<proper output>'
},
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.