Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

How to convince SOAP::Lite to return UTF-8 data in responses as UTF-8?

by mithaldu (Monk)
on Feb 20, 2012 at 17:21 UTC ( #955106=perlquestion: print w/replies, xml ) Need Help??
mithaldu has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to transmit UTF-8 strings in complex data structures with SOAP::Lite. However, as it turns out, SOAP::Lite quietly converts all UTF-8 strings into base-64-encoded octets. The problem with that is that the deserializing does not revert the conversion and only does a straight base64 decode.

This leaves me confused as to how a user is supposed to ensure that they get UTF-8 data from the SOAP::Lite response. Walking the tree and running `decode_utf8` on all strings seems wasteful.

Any suggestions?

Edit: In a nutshell, how do i make this test pass without monkey-patching?
  • Comment on How to convince SOAP::Lite to return UTF-8 data in responses as UTF-8?

Replies are listed 'Best First'.
Re: How to convince SOAP::Lite to return UTF-8 data in responses as UTF-8?
by Khen1950fx (Canon) on Feb 20, 2012 at 19:33 UTC
    You can turn off base64 encoding:
    my $string = SOAP::Data->type( string => 'UTF-8' );
      That helps with single strings, but when you have a complex data structure i'd still need to walk the entire tree to mark all of the strings.
Re: How to convince SOAP::Lite to return UTF-8 data in responses as UTF-8?
by gaimrox (Initiate) on Feb 04, 2016 at 03:22 UTC


    I filed a bug regarding this specific issue, you can read it here:

    As noted within this ticket, I have a workaround for your specific problem.

    This module contains an internally defined list of typelookup handlers for each supported primitive. It looks like this:

    typelookup => { base64 => [10, sub {$_[0] =~ /[^\x09\x0a\x0d\x20-\x7f]/}, 'as_base64 +'], int => [20, sub {$_[0] =~ /^[+-]?\d+$/}, 'as_int'], double => [30, sub {$_[0] =~ /^(-?(?:\d+(?:\.\d*)?|\.\d+)|([+-]?)(?= +\d|\.\d)\d*(\.\d*)?([Ee]([+-]?\d+))?)$/}, 'as_double'], dateTime => [35, sub {$_[0] =~ /^\d{8}T\d\d:\d\d:\d\d$/}, 'as_dateTi +me'], string => [40, sub {1}, 'as_string'], },

    You'll notice that base64 is first, and has a precedence value of 10. That means that any value seen will be duck typed using the specified comparison BEFORE any other comparisons are made.

    You'll also notice the duck typing comparison here basically accepting "anything but ASCII". That is the reason your UTF8 strings are being base64 encoded, because they are anything but ASCII, and therefor meet this definition.

    Fixing this is relatively trivial, fortunately.

    What I ended up doing was overriding the initialize() method contained with XMLRPC::Lite. Within this override I invoke the original initialize() from XMLRPC::Lite, but then I stomp over the value of the base64 typehandler. Instead of looking for all non-ascii, I look for all non-ASCII that doesn't have that utf8 flag set.

    The result ends up looking like this:

    sub initialize { my $self = shift; my $config = {XMLRPC::Server::initialize(@_)}; my $typelookup = $$config{serializer}->typelookup(); # adjust the definition for base64 data, skip over any scalars with +the utf8 property set $typelookup->{base64} = [10, sub {$_[0] =~ /[^\x09\x0a\x0d\x20-\x7f] +/ && !utf8::is_utf8($_[0])}, 'as_base64']; return %{$config}; };

    Once implemented this seemingly works as expected, although I welcome any critiques that might prove otherwise.

    Moving forward it appears that SOAP::Lite project is either dead, or moving very slowly. The above definition for base64 should probably me merged right into the project, as it's a substantial improvement over what's being distributed right now, but it appears there is nobody to do that...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://955106]
Approved by Corion
Front-paged by Corion
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2018-04-21 15:54 GMT
Find Nodes?
    Voting Booth?