Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Hello, Monks

I have this problem:

I have a script fetching some data from a Postgresql database.

The database server encoding is UTF-8, and the script is in cp1251 (windows-1251). After connecting to the database (using DBI) I set the database client encoding to WIN1251, so to tell the database that I want the output to be in this encoding. Then I execute the SQL select query and I fetch the data in a hash reference. When I try to dump the data into the console(or just use it in any other way) I get this warning / error:

Malformed UTF-8 character (unexpected non-continuation byte 0x2e, immediately after start byte 0xf2) in subroutine entry at /usr/local/lib/perl5/5.16/mach/Data/Dumper.pm line 205.

The reason that we use windows-1251 as code encoding is because it's legacy code and we have data in cyrrilic. The database is in UTF8 because we hope that we'll rewrite the code in the future in UTF8...

Can someone tell me why this is happening and how to avoid it?!

I'm using:

FreeBSD 9.2-STABLE #2 r265059 Perl v5.16.3 Postgresql 9.1.13 DBI v1.631 DBD::Pg v3.0.0

My locale settings:

LANG=bg_BG.CP1251 LC_CTYPE="bg_BG.CP1251" LC_COLLATE="bg_BG.CP1251" LC_TIME="bg_BG.CP1251" LC_NUMERIC=C LC_MONETARY="bg_BG.CP1251" LC_MESSAGES="bg_BG.CP1251" LC_ALL=

My database locale:

Name | Owner | Encoding | Collate | Ctype | ----------------+---------+----------+--------------+--------------+ database_name | yavor | UTF8 | C | C |

My script:

#!/usr/bin/perl use warnings; use strict; use Data::Dumper; use DBI; #connection settings my $s_db_name = 'database_name'; my $s_db_user = 'user'; my $s_db_auth = 'pass'; #hash ref for data fetching my $href_row = {}; #connection with transaction my $oref_dbh = DBI->connect('dbi:Pg:dbname=' . $s_db_name, $s_db_user, + $s_db_auth, {AutoCommit => 0, RaiseError => 1}); #getting the default client encoding my $oref_sth = $oref_dbh->prepare("SHOW client_encoding;"); $oref_sth->execute; $href_row = $oref_sth->fetchrow_hashref(); print "\n Default Client Encoding: " . Dumper($href_row); #UTF8 #setting client encoding to be windows-1251 $oref_sth = $oref_dbh->prepare("SET client_encoding = 'WIN1251';"); $oref_sth->execute; #getting the current client encoding $oref_sth = $oref_dbh->prepare("SHOW client_encoding;"); $oref_sth->execute; $href_row = $oref_sth->fetchrow_hashref(); print "\n Current Client Encoding: " . Dumper($href_row); #WIN1251 #preparing the SELECT query $oref_sth = $oref_dbh->prepare(" SELECT id, name_en, host, descr_bg, descr_en FROM application;"); #executing it $oref_sth->execute; #fetching the data $href_row = $oref_sth->fetchall_hashref('id'); #rollback the current transaction $oref_dbh->rollback(); #dumping the fetched data using Data::Dumper print "\n Data: " . Dumper($href_row);

Result from execution:

Default Client Encoding: $VAR1 = { 'client_encoding' => 'UTF8' }; Current Client Encoding: $VAR1 = { 'client_encoding' => 'WIN1251' }; Malformed UTF-8 character (unexpected non-continuation byte 0xee, imme +diately after start byte 0xd1) in subroutine entry at /usr/local/lib/ +perl5/5.16/mach/Data/Dumper.pm line 205. ... Malformed UTF-8 character (1 byte, need 4, after start byte 0xf2) in s +ubroutine entry at /usr/local/lib/perl5/5.16/mach/Data/Dumper.pm line + 205. ... Data: $VAR1 = { '<proper ouput>' => { 'descr_en' => '', 'descr_bg' => '<proper output in the console in cyrri +llic>', 'id' => <proper output>, 'name_en' => '<proper output>', 'host' => '<proper output>' },

In reply to Malformed UTF-8 character error after fetching data from Postgresql by nihiliath

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2024-04-19 18:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found