http://www.perlmonks.org?node_id=1201430

NeedForPerl has asked for the wisdom of the Perl Monks concerning the following question:

Hi, Monks I'm using XML::Smart and running into some problems if there a special characters in the XML document:

If i create an XML::Smart object that encapsulates an XML document which contains special characters like "ä" or "ü" the function data() uses Base64 encoding but i don't want that. So i decided to use the function with the argument "decode => 1".

After that change everthing works fine unless there are special XML characters like "<", ">" or "&" inside an XML element . I guess that the call of data(decode => 1) results in encoding "&amp;" to "&" for instance. Is it possible to avoid that behaviour?

I used the function set_binary('FALSE') but somehow it didn't work:

my $log = `svn log http://... --xml --revision 123`; my $test = XML::Smart->new($log); $test->{log}->{logentry}[0]->{msg}->set_binary('FALSE'); print $test->data();

I'm using version 1.78 of the module.

I have tried to contact the author via mail using the e-mail address which can be found on the FAQ of XML::Smart. The E-Mail address doesn't exist anymore.

Many thanks in advance.

Replies are listed 'Best First'.
Re: XML::Smart - undesired decoding of special XML characters
by Your Mother (Archbishop) on Oct 16, 2017 at 14:28 UTC

    This should explain your issue. :P

    perl -le 'print "FALSE" ? "is true" : "is false"'

    If it that's not clearing it up, try this.

    $test->{log}->{logentry}[0]->{msg}->set_binary(0);

      Thank you very much. It works fine. I know that it's a beginners mistake. I'm unfortunately not a professional perl programmer. But there is another problem. The structure of the XML document can become complex. A logentry can encapsulate further logentries and so on. So i have to recursively run through all the elements of the XML tree and call the function set_binary(0)? There are different elements within an logentry which could be decoded in Base64. Maybe the easiest way is to set every XML element as none binary. Maybe someone has an idea or solution how to easily achieve this. That would be very kind of you.

Re: XML::Smart - undesired decoding of special XML characters
by Your Mother (Archbishop) on Oct 16, 2017 at 19:07 UTC

    I highly discourage this but I was curious to try it and didn't see any way around this kind of wackiness in the XML::Smart docs, and it will "work" for what you want.

    use strictures; use XML::Smart; use Aspect; open my $fh, "<", "xml.xml" or die $!; my $logString = do { local $/; <$fh> }; around { my $return = $_->original->($_->args); $_->return_value( $return == 4 ? 2 : $return ); } call "XML::Smart::_data_type"; my $test = XML::Smart->new($logString); print $test->data;

    Docs -> Aspect. The problem is that XML::Smart sees anything outside a very basic set of characters as binary; it's being overly formal but seems pretty correct really. So, the second you pass in any wide/utf-8 stuff, the binary switch flips. I poked around a little and didn't see a way to circumvent or configure around it.

    My real advice since XML::Smart is not actively maintained would be switch to a different XML library. XML::Twig or XML::LibXML probably.

      Hi, Thank you very much for your solution. It seems to work. It runs succesful through a short test. I guess the code does the following:

      The internal subroutine _data_type of the module XML::Smart is used to determine if the content of an XML element should be treated as binary data. Everytime when the subroutine returns the value 2 (data type binary), the return value will set to the new value 4 (data type content, i.e. no binary data). So XML::Smart never uses Base64 Encoding. Is this explanation correct?

        Almost. 4 is binary which is switched to 2, content, and anything else is passed through as is.

        Even if it works for you, I think you should be looking for alternatives in your XML handling.

        Update: and I'm not positive XML::Smart can't do this correctly. Someone might be able to get it to work for you. I just couldn't get it to, passed it encoded and decoded UTF-8 and it turned both into the binary encoding.

Re: XML::Smart - undesired decoding of special XML characters
by 1nickt (Canon) on Oct 16, 2017 at 11:14 UTC

    Hi, the doc for XML::Smart states:

    When loading XML data with UTF-8, Perl (5.8+) should make all the work internally.

    please provide a short sample of the XML you are attempting to work with.


    The way forward always starts with a minimal test.

      Hi, Thanks for the fast reply. For instance i use the following XML file:

      <?xml version="1.0" encoding="UTF-8"?> <log> <logentry revision="12345"> <author>someAuthor</author> <date>2017-10-11T09:32:15.704935Z</date> <msg>This is my SVN message with characters like ä or &amp;.</msg> </logentry> </log>

      After I execute the following script ...

      use XML::Smart; open(my $fh, "<", "test.xml") or die $!; my $logString; while (<$fh>) { $logString .= $_; } my $test = XML::Smart->new($logString); $test->{log}->{logentry}[0]->{msg}->set_binary('FALSE'); print $test->data();

      ... I get the following result.

      <?xml version="1.0" encoding="UTF-8" ?> <?meta name="GENERATOR" content="XML::Smart/1.78 Perl/5.024001 [MSWin3 +2]" ?> <log> <logentry revision="12345"> <author>someAuthor</author> <date>2017-10-11T09:32:15.704935Z</date> <msg dt:dt="binary.base64">VGhpcyBpcyBteSBTVk4gbWVzc2FnZSB3aXRoIGN +oYXJhY3RlcnMgbGlrZSDkIG9yICYu</msg> </logentry> </log>

      If i call the subroutine data(decode => 1) the msg element contains the decoded message:

      <msg>This is my SVN message with characters like ä or &.</msg>

      But this output is invalid because "&" is not replaced by an escape sequence. I need an XML file with no Base64 encoding and escaped special XML characters like "&". All in one, a valid XML document without Base64 encoding. The XML parser which parses the output of the script can't handle Base64 encoding. I wonder if can solve the problem by using the subroutine set_binary('FALSE'). PS: I´m using Strawberry Perl v5.24.1 on Windows Server 2012 R2 Datacenter.

        NeedForPerl:

        I just checked your code with Your Mother's recommendation and it worked just fine. I did, however, have to remove the ä symbol because XML::Smart complained about it being encoded incorrectly. After that, and changing 'FALSE' to 0, it gave me (the presumably expected):

        <?xml version="1.0" encoding="UTF-8" ?> <?meta name="GENERATOR" content="XML::Smart/1.78 Perl/5.022004 [cygwin +]" ?> <log> <logentry revision="12345"> <author>someAuthor</author> <date>2017-10-11T09:32:15.704935Z</date> <msg>This is my SVN message with characters like or &amp;.</msg> </logentry> </log>

        I could easily have something munged in my various Windows/Cygwin/vim settings to have messed up the 'ä', but I'm mentioning it just in case you need to know of it.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.