Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

XML::Twig modify data, and I don't want that

by physi (Friar)
on Sep 16, 2013 at 10:53 UTC ( #1054275=perlquestion: print w/ replies, xml ) Need Help??
physi has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm currently going mad with a german UMLAUT Problem and XML:Twig. Here's the code:
use strict; use warnings; use XML::Twig::XPath; $/ = undef; my $data = <DATA>; my $twig=XML::Twig::XPath->new(pretty_print => 'nice', keep_encoding=>1, twig_handlers => { 'add' => \&_check +Add, } ); $twig->parse($data); # build ito my $out = $twig->sprint; print $out; sub _checkAdd { my( $t, $addAttr)= @_; my $elt; $addAttr->set_tag('check'); $elt= parse XML::Twig::Elt( qq(<p>test</p>) ); $elt->paste('last_child',$addAttr); } __DATA__ <?xml version="1.0" encoding="ISO-8859-1"?> <doc><url><irl>with &#xDC; here</irl></url><add></add></doc>
I like to get this output:
<?xml version="1.0" encoding="ISO-8859-1"?> <doc> <url> <irl>with &#xDC; here</irl> </url> <check> <p>test</p> </check> </doc>
but I get:
<?xml version="1.0" encoding="ISO-8859-1"?> <doc> <url> <irl>with &amp;#xDC; here</irl> </url> <check> <p>test</p> </check> </doc>

When there is no <add> Tag in the xml (which is going to be modified in the subfunction), the output is ok. So there might be a problem only if twig goes into that subfunction!?
Any help is very welcome.
Thanks
Christian

-----------------------------------
--the good, the bad and the physi--
-----------------------------------

Comment on XML::Twig modify data, and I don't want that
Select or Download Code
Re: XML::Twig modify data, and I don't want that (preserve entities keep_encoding)
by Anonymous Monk on Sep 16, 2013 at 11:05 UTC
      Thanks, but I can't figure out, how this helps?
      Or is the short answer: "It's not possible!" ?
      -----------------------------------
      --the good, the bad and the physi--
      -----------------------------------
      

        physi:

        I believe the AM means that you can use the keep_encoding option to retain your original string data.

        However, I think that the problem is that you're specifying the encoding as ISO-8859-1. If I understand properly, that means that you're telling XML::Twig that there's no Unicode data in your input. However, you have an entity in there. Since the ampersand is special, XML::Twig is properly escaping that so that when it decodes in the future, that it generates the proper output.

        If you specify a unicode encoding, I expect that XML::Twig will then read the string as a unicode character, and then emit it properly when it rewrites the file.

        Disclaimer: What I know about unicode you can write on a pinhead with lipstick.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: XML::Twig modify data, and I don't want that
by mirod (Canon) on Sep 16, 2013 at 13:09 UTC

    This is a funny one. And it shows a bug in the module.

    The problem is the line in the handler that creates the element using paste. It creates a new twig, and messes up the options on the "main" one.

    So the workaround is to replace that line with this:

    $elt= XML::Twig->parse( keep_encoding=>1, qq(<p>test</p>) )->root->cut;

    This way keep_encoding is preserved and you get the result you want

    ,p>Then I have to fix this in the module, and hopefully in the next version your code will run as-is (you should really write XML::Twig::Elt->parse instead of using the indirect object syntax though).

      Cheers for that !
      havn't thought of using keep_encoding in the parse part.

      And thank you for that great module !

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1054275]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (19)
As of 2015-07-02 13:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (39 votes), past polls