http://www.perlmonks.org?node_id=1072711


in reply to DBD::Pg encodes Perlstring to UTF-8 bytes instead of WIN1252 regardless client encoding

I am not an expert when it comes to DBD::Pg, but my understanding is that client encoding and the table encoding do not have to agree. You pass strings in the client encoding (UTF-8) to DBD::Pg, or if you have pg_enable_utf8 set to 1, you simply pass in text strings.

According to the docs, Postgres automatically recodes from table to client encoding and vice versa if you tell it to use a specific client encoding. So it should be pretty transparent.

Replies are listed 'Best First'.
Re^2: DBD::Pg encodes Perlstring to UTF-8 bytes instead of WIN1252 regardless client encoding
by Pickwick (Beadle) on Jan 30, 2014 at 18:57 UTC
    According to the docs, Postgres automatically recodes from table to client encoding and vice versa if you tell it to use a specific client encoding. So it should be pretty transparent.

    That's what I thought as well, but it simply doesn't behave that way on my systems. Client encoding tells WIN1252, but UTF-8 encoded bytes are send. The only thing I have in between is DBIx::Log4perl which logs the statements send to the server and shows that UTF-8 bytes are send.

        I did, but that's not the point of my question why I need to at all to get it working, because depending on the client encoding DBD::Pg should properly encode the data itself.

      DBIx::Log4perl sees the data before it is sent to the database by DBD::Pg so you cannot rely on what you see in its output as DBD::Pg can change the data.

      I took a casual glance at DBD::Pg code and all the UTF8 stuff seemed to be wrapped in pg_enable_utf8. Are you binding the data as parameters when it is inserted?

      The trouble here is there are a number of variables. You database uses 1252 encoding. What is your postgres client charset set to and what is the encoding of the data you pass to DBD::Pg when it fails and do you have pg_enable_utf8 on?

        DBIx::Log4perl sees the data before it is sent to the database by DBD::Pg so you cannot rely on what you see in its output as DBD::Pg can change the data.

        In theory, yes, but practically Log4perl was logging as expected a line before the SQL statement and the output of DBIx makes sense for the problem I described. If I encode the data to WIN1252 before passing it to DBI the logged output of DBIx change as well, therefore my strong guess that DBIx is logging what gets send to the database.

        Are you binding the data as parameters when it is inserted?

        No, the application simply creates a SQL string by concatenating different UTF-8 Perlstrings together, pushes that with a "do" and the whole string formed by SQL command and values gets encoded to UTF-8.

        What is your postgres client charset set to

        It's automatically detected as WIN1252 when the problem occurs, which makes perfectly sense on Windows and the Linux I tested. Another Ubuntu server automatically detects it as UTF-8 instead, because it has an UTF-8 locale. If I manually change it to UTF-8 everything works as expected, the database server properly encodes UTF-8 bytes to WIN1252 characters in the target database.

        and what is the encoding of the data you pass to DBD::Pg when it fails

        Valid UTF-8 Perlstrings with UTF-8 flag turned on.

        and do you have pg_enable_utf8 on?

        No, because from my understanding it is deprecated and only used in reading from the database, not writing to.