in reply to XML::Smart - undesired decoding of special XML characters

I highly discourage this but I was curious to try it and didn't see any way around this kind of wackiness in the XML::Smart docs, and it will "work" for what you want.

use strictures; use XML::Smart; use Aspect; open my $fh, "<", "xml.xml" or die $!; my $logString = do { local $/; <$fh> }; around { my $return = $_->original->($_->args); $_->return_value( $return == 4 ? 2 : $return ); } call "XML::Smart::_data_type"; my $test = XML::Smart->new($logString); print $test->data;

Docs -> Aspect. The problem is that XML::Smart sees anything outside a very basic set of characters as binary; it's being overly formal but seems pretty correct really. So, the second you pass in any wide/utf-8 stuff, the binary switch flips. I poked around a little and didn't see a way to circumvent or configure around it.

My real advice since XML::Smart is not actively maintained would be switch to a different XML library. XML::Twig or XML::LibXML probably.

Replies are listed 'Best First'.
Re^2: XML::Smart - undesired decoding of special XML characters
by Anonymous Monk on Oct 17, 2017 at 09:36 UTC

    Hi, Thank you very much for your solution. It seems to work. It runs succesful through a short test. I guess the code does the following:

    The internal subroutine _data_type of the module XML::Smart is used to determine if the content of an XML element should be treated as binary data. Everytime when the subroutine returns the value 2 (data type binary), the return value will set to the new value 4 (data type content, i.e. no binary data). So XML::Smart never uses Base64 Encoding. Is this explanation correct?

      Almost. 4 is binary which is switched to 2, content, and anything else is passed through as is.

      Even if it works for you, I think you should be looking for alternatives in your XML handling.

      Update: and I'm not positive XML::Smart can't do this correctly. Someone might be able to get it to work for you. I just couldn't get it to, passed it encoded and decoded UTF-8 and it turned both into the binary encoding.