http://www.perlmonks.org?node_id=1054275

physi has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm currently going mad with a german UMLAUT Problem and XML:Twig. Here's the code:
use strict; use warnings; use XML::Twig::XPath; $/ = undef; my $data = <DATA>; my $twig=XML::Twig::XPath->new(pretty_print => 'nice', keep_encoding=>1, twig_handlers => { 'add' => \&_check +Add, } ); $twig->parse($data); # build ito my $out = $twig->sprint; print $out; sub _checkAdd { my( $t, $addAttr)= @_; my $elt; $addAttr->set_tag('check'); $elt= parse XML::Twig::Elt( qq(<p>test</p>) ); $elt->paste('last_child',$addAttr); } __DATA__ <?xml version="1.0" encoding="ISO-8859-1"?> <doc><url><irl>with &#xDC; here</irl></url><add></add></doc>
I like to get this output:
<?xml version="1.0" encoding="ISO-8859-1"?> <doc> <url> <irl>with &#xDC; here</irl> </url> <check> <p>test</p> </check> </doc>
but I get:
<?xml version="1.0" encoding="ISO-8859-1"?> <doc> <url> <irl>with &amp;#xDC; here</irl> </url> <check> <p>test</p> </check> </doc>

When there is no <add> Tag in the xml (which is going to be modified in the subfunction), the output is ok. So there might be a problem only if twig goes into that subfunction!?
Any help is very welcome.
Thanks
Christian

-----------------------------------
--the good, the bad and the physi--
-----------------------------------

Replies are listed 'Best First'.
Re: XML::Twig modify data, and I don't want that
by mirod (Canon) on Sep 16, 2013 at 13:09 UTC

    This is a funny one. And it shows a bug in the module.

    The problem is the line in the handler that creates the element using paste. It creates a new twig, and messes up the options on the "main" one.

    So the workaround is to replace that line with this:

    $elt= XML::Twig->parse( keep_encoding=>1, qq(<p>test</p>) )->root->cut;

    This way keep_encoding is preserved and you get the result you want

    ,p>Then I have to fix this in the module, and hopefully in the next version your code will run as-is (you should really write XML::Twig::Elt->parse instead of using the indirect object syntax though).

      Cheers for that !
      havn't thought of using keep_encoding in the parse part.

      And thank you for that great module !

Re: XML::Twig modify data, and I don't want that (preserve entities keep_encoding)
by Anonymous Monk on Sep 16, 2013 at 11:05 UTC
      Thanks, but I can't figure out, how this helps?
      Or is the short answer: "It's not possible!" ?
      -----------------------------------
      --the good, the bad and the physi--
      -----------------------------------
      

        physi:

        I believe the AM means that you can use the keep_encoding option to retain your original string data.

        However, I think that the problem is that you're specifying the encoding as ISO-8859-1. If I understand properly, that means that you're telling XML::Twig that there's no Unicode data in your input. However, you have an entity in there. Since the ampersand is special, XML::Twig is properly escaping that so that when it decodes in the future, that it generates the proper output.

        If you specify a unicode encoding, I expect that XML::Twig will then read the string as a unicode character, and then emit it properly when it rewrites the file.

        Disclaimer: What I know about unicode you can write on a pinhead with lipstick.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.