http://www.perlmonks.org?node_id=589132

jdtoronto has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed monks,

SOAP::Lite has bitten me yet again. I have an application that queries a server which returns data from a MySQL database. Everything is fine - as long as there are no 'oddball' characters in the data. For example, I need do nothing more than add an é ( 0xE9 according to the Windows character map ) and then the whole thing stops. The table in MySQL has been defined with a character set of UTF-8.

I assume I am doing something inherently stupid, surely no reasonable module would reject standard characters sets like that?

In the following example method, taken from my server, I have whittled the code down to the point where I just set the name.

sub _10533 { # test method my ( $class, $message ) = @_; use XML::Simple; my $ref = XMLin($message); $ref->{data}->{firstname} = 'joé'; return XMLout($ref, KeepRoot => 1 ); }
If the name is 'joe' then the client gets the XML, if it is 'joé' then no data is returned.

So what am I doing wrong?

jdtoronto

updated correct typo, thanks marto

Replies are listed 'Best First'.
Re: Problem with SOAP::Lite and accented characters.
by Joost (Canon) on Dec 11, 2006 at 20:02 UTC
      Good question, but exactly the same thing happened when the database table had a default character set of latin-1.

      jdtoronto

        Ok, but encoding mismatches still might be the source of the problem: if your (SOAP) xml prolog and/or HTTP headers end up with the wrong indication for the character encoding, it's likely you'll run into trouble somewhere. IIRC perl will use either utf-8 or latin-1 for 'high-bit' encoding.

        You probably expect your XML to be in one or the other, so you need to be sure the data in it is correctly encoded. If everything else is working correctly, doing an Encode::encode() to utf-8 or latin-1 over the whole resulting XML response and a binmode() to :bytes (or possibly just a binmode to utf8 if that's what you're using) should work.

Re: Problem with SOAP::Lite and accented characters.
by clinton (Priest) on Dec 11, 2006 at 20:43 UTC
    In your simple case, unless you have

     use utf8;

    in your script, the joé is not being interpreted as UTF8. You may want to try:

    $ref->{data}->firstname="jo\x{e9}";

    to be sure that it is interpreted correctly.

    When I retrieve UTF8 from MySQL, I do:

     utf8::decode($value)

    to make sure that it is correctly interpreted as UTF8