Pickwick has asked for the wisdom of the Perl Monks concerning the following question:
Hi monks,
I have an legacy application which uses Postgres to store some data and it's database has been created with the charset WIN1252 for legacy reasons. Internally the application should only work with UTF-8 Perlstrings in the meantime, but there may be places were it doesn't. My problem now is that if I insert some textual data with german umlauts into the database, the result is that I get UTF-8 bytes instead of the proper german umlaut.
I debugged the problem and in the application the strings all look as expected, the root cause seems to be that DBD::Pg encodes the data as UTF-8 instead of WIN1252 before transferring it to the database. This makes me wonder because the clinet encoding is properly detected as WIN1252 automatically and the fact that DBD::Pg can encode to valid UTF-8 looks like my strings are in fact valid Perlstrings.
If I change my application to set the client encoding to UTF-8 or manually encode my strings to WIN1252 everything works as expected, in both cases I get valid german umlauts in the database. Both of course work because if I tell the connection it's UTF-8, the server can recode properly to WIN1252 and if I send WIN1252 myself the server won't change anything but store the bytes 1:1.
From my understanding, if DBD::Pg detects a client encoding of WIN1252 automatically it shouldn't encode the data to send to UTF-8, but WIN1252 itself. But obviously I'm wrong because the same problem exists on a Windows host, but I just didn't realize it before because in this case the target database has been created as UTF-8.
Is it expected behavior that DBD::Pg encodes UTF-8 Perlstrings to UTF-8 bytes before sending them to the server, regardless of the (automatically detected) client encoding? Does this mean that I simply need to always set the client encoding to UTF-8 if I'm sure to have valid UTF-8 Perlstrings internally?
Thanks for your wisdom!
|
---|