Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re: Malformed UTF-8 character error after fetching data from Postgresql

by sundialsvc4 (Abbot)
on Jul 17, 2014 at 18:00 UTC ( #1094098=note: print w/replies, xml ) Need Help??

in reply to Malformed UTF-8 character error after fetching data from Postgresql

Very respectfully to you, stefby, “I’m not so sure that you’re correct on this.

When I read section 23.3.3 of this PostGres doc page, it states that automatic conversion between client and server datasets is provided, and UTF8<->WIN is a supported combination.

What I am suspicious of is that Perl is treating the data as UTF8, within Data::Dumper.   It should be easy to query the database directly to be sure that the characters were stored (translated) correctly, and that the received characters are in CP1251.   I suspect that Perl thinks that it’s dealing with UTF8.

  • Comment on Re: Malformed UTF-8 character error after fetching data from Postgresql

Replies are listed 'Best First'.
Re^2: Malformed UTF-8 character error after fetching data from Postgresql
by stefbv (Deacon) on Jul 17, 2014 at 18:19 UTC

    That is interesting, and I found another phrase in the DBD::Pg docs that sounds like I was mistaken:

    pg_enable_utf8 (integer)

    DBD::Pg specific attribute. The behavior of DBD::Pg with regards to this flag has changed as of version 3.0.0. ...

    "Note that the value of client_encoding is only checked on connection time. If you change the client_encoding to/from 'UTF8' after connecting, you can set pg_enable_utf8 to -1 to force DBD::Pg to read in the new client_encoding and act accordingly."

      10x. That seems to be the problem. When I connect to my database the client encoding is UTF8, and by default DBD::Pg set the internal Perl UTF8 flag to true. This cause problems when I change the client encoding after connecting. Perl is asuming that my fetched data should be stored as UTF8 and this is not working. When I set pg_enable_utf8 to 0 everything is fine. Better solution is to set the pg_enable_utf8 flag to -1 after changing the client_encoding as suggested in the DBD::Pg documentation.

      My instinct here is that Data::Dumper is the one that is confused, not anything in the DBI Stack.

        If you don't have a test/demo case to back it up, you'd be better off not trusting your "instinct". Data::Dumper is actually pretty good about avoiding and clearing up confusion (unless of course you don't know how to read its output).

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1094098]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2017-11-18 14:40 GMT
Find Nodes?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:

    Results (277 votes). Check out past polls.