Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Twig delete not deleting the entire section?

by paisani (Acolyte)
on Mar 14, 2018 at 16:19 UTC ( [id://1210894]=perlquestion: print w/replies, xml ) Need Help??

paisani has asked for the wisdom of the Perl Monks concerning the following question:

use strict; use warnings; use XML::Twig; my $xml = q( <sites> <site siteid="ONE"> <name>name1</name> <address>address1</address> <contact>contact1</contact> </site> <site siteid="TWO"> <name>name2</name> <address>address2</address> <contact>contact2</contact> </site> </sites> ); my %handlers = ( 'name[string() =~ /name2/]' => sub { my ($twig, $cnt) + = @_; $cnt->parent->delete;} ); my $twig= new XML::Twig( PrettyPrint => 'indented', twig_handlers => + \%handlers); $twig->parse($xml); print $twig->sprint;
Gives me this output -
<sites> <site siteid="ONE"> <name>name1</name> <address>address1</address> <contact>contact1</contact> </site> <address>address2</address> <contact>contact2</contact> </sites>
What am I missing?? I wanted to delete the entire section for siteid=TWO.

Replies are listed 'Best First'.
Re: Twig delete not deleting the entire section?
by choroba (Cardinal) on Mar 14, 2018 at 16:46 UTC
    When you're deleting the node, the remaining address and contact elements haven't been parsed yet, so they aren't removed.

    Create a handler for the node you want to remove. Unfortunately, you can't use something like

    site[name[string() =~ /name2/]]
    because XML::Twig doesn't support the full XPath syntax:

    > XPath expressions are limited to using the child and descendant axis (indeed you can't specify an axis), and predicates cannot be nested.

    You can do part of the work in Perl, though:

    my %handlers = ( 'site' => sub { my ($twig, $cnt) = @_; $cnt->delete if grep $_->text =~ /name2/, $cnt->children('name'); } );

    BTW, in XML::XSH2, you'd just

    delete /sites/site[xsh:match(name,"name2")] ;
    ($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord }map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
      That would be a straight up bug, wouldn't it? But I'm not sure that's right. I think we may be missing something because first it does delete the closing /site tag, plus, this below should have worked then because I'm specifically parsing the children -
      use strict; use warnings; use XML::Twig; my $xml = q( <sites> <site siteid="ONE"> <name>name1</name> <address>address1</address> <contact>contact1</contact> </site> <site siteid="TWO"> <name>name2</name> <address>address2</address> <contact>contact2</contact> </site> </sites> ); my %handlers = ( 'name[string() =~ /name2/]' => sub { my ($twig, $cnt) = @_; my $parent = $cnt->parent; foreach ($parent->children) { print "Deleting: " . $_->text . "\n"; $_->delete; } $parent->delete; } ); my $twig= new XML::Twig( PrettyPrint => 'indented', twig_handlers => + \%handlers); $twig->parse($xml); print $twig->sprint;
      Gave the following output -
      Deleting: name2 <sites> <site siteid="ONE"> <name>name1</name> <address>address1</address> <contact>contact1</contact> </site> <address>address2</address> <contact>contact2</contact> </sites>

        When the handler triggers on name only

        <site siteid="TWO"> <name>name2</name>

        has been parsed so name is the only child the parent has.

        Add a $cnt->parent->print statement to see it. If you change the order of elements your original code works.

        <sites> <site siteid="ONE"> <name>name1</name> <address>address1</address> <contact>contact1</contact> </site> <site siteid="TWO"> <address>address2</address> <contact>contact2</contact> <name>name2</name> </site> </sites>

        If the handler is on site like choroba said, that is triggered after all the children have been parsed.

        poj
Re: Twig delete not deleting the entire section?
by Discipulus (Canon) on Mar 14, 2018 at 19:56 UTC
    Hello paisani,

    indeed as wise monks above (choroba and poj) already said the (sub)tree is not parsed (never realized this fact btw) in the moment you catch <name>name1</name> if instead you put your logic on the site element it works as you expect, just doing $_->delete if $_->children ('name[string() =~ /name2/]');

    use strict; use warnings; use XML::Twig; my $xml = q( <sites> <site siteid="ONE"> <name>name1</name> <address>address1</address> <contact>contact1</contact> </site> <site siteid="TWO"> <name>name2</name> <address>address2</address> <contact>contact2</contact> </site> </sites> ); my %handlers = ( site => sub { # my ($twig, $cnt) = @_; # $cnt is unuseful: iirc $_ is mapped +$_[1] $_->delete if $_->children ('name[string() =~ /name2/] +'); } ); my $twig= new XML::Twig( PrettyPrint => 'indented', twig_handlers => + \%handlers); $twig->parse($xml); $twig->print; # output: <sites> <site siteid="ONE"> <name>name1</name> <address>address1</address> <contact>contact1</contact> </site> </sites>

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Twig delete not deleting the entire section?
by Anonymous Monk on Mar 15, 2018 at 12:25 UTC
    I think that you're being caught by the fact that Twig is, by design, an incremental parser. It never builds an in-memory data structure as LibXML2 does. But maybe in this case this is what you need to be doing. Twig will call a handler as soon as it recognizes the need for it and before it has processed anything else in the input. It is therefore good for processing arbitrarily large XML files without a correspondingly-high memory burden, but it is not well-suited to structure manipulation or modification for the reasons stated. If you need to modify, use LibXML2 which will parse the entire thing into a data structure that you can then manipulate and rewrite.
Re: Twig delete not deleting the entire section?
by paisani (Acolyte) on Mar 14, 2018 at 20:35 UTC

    Under what circumstance is this desired behavior? This just leaves orphans.

    I even specifically processed each of the children as shown above and it still didn't work.Further, it deleted the closing /site tag even though that hadn't been parsed by your explanations?

    By the powers vested in me as a Perl Monk supplicant, I call a bug!

      Children handlers get triggered before parent handlers -- by design -- as soon as an element is completely parsed the handler is triggered -- its depth first -- cause thats how xml works

      If you are trying to delete children from parents, based on criteria in the children, you have to do it from parent handlers, simply use findnodes instead of handlers

      So sorry, but its not a bug :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1210894]
Approved by hippo
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-04-24 22:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found