Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

XML::Twig removing tags from content

by slugger415 (Scribe)
on Sep 26, 2011 at 16:06 UTC ( #927894=perlquestion: print w/ replies, xml ) Need Help??
slugger415 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I've been looking through the Monks site and other resources for answers but haven't come up with anything. (xmltwig.com seems to have disappeared.)

I'm looking for a way to delete an XML tag without deleting its contents. For example, I have this:

<li>
  <p>
   Some <b>text</b>
  </p>
</li>

And I want it to be just:

<li>
   Some <b>text</b>
</li>
I tried this but I get an error ("cannot paste an element that belongs to a tree"):
  if($p->parent('li')){
   my(@children) = $p->children;
   $p->delete;
   foreach my $c (@children){
    $c->paste;
   }  
  }

Forgive me as I'm a newbie at Twig and am still trying to figure out how it works.

Many thanks, Scott

Comment on XML::Twig removing tags from content
Re: XML::Twig removing tags from content (XML::Twig)
by toolic (Bishop) on Sep 26, 2011 at 17:07 UTC
    cut_children removes the 'p' tags and gets you closer...
    use warnings; use strict; use XML::Twig; my $str = <<EOF; <li> <p> Some <b>text</b> </p> </li> EOF my $t = XML::Twig->new( twig_handlers => {li => \&li}, pretty_print => 'indented', ); $t->parse($str); $t->print(); sub li { my ($t, $elt) = @_; for my $p ($elt->children('p')) { for my $c ($p->cut_children()) { $c->paste($elt); } $p->delete(); } } __END__ <li><b>text</b> Some </li>

      Hi Canon, getting closer, but for some reason the children are coming out in the wrong order (reversed?):

      Original:

      <li class="c2">
      	<p class="Number1">In the XYZ pane, select the <span class="c1">PageID</span> ruleset and click the <span class="c1">Lock/Unlock ruleset</span> button. Then expand the
      	<span class="c1">PageID</span> ruleset to view the two rules.</p>
      </li>
      
      

      Result:

            <li class="c2"> ruleset to view the two rules.<span class="c1">PageID</span> button. Then expand the
                  <span class="c1">Lock/Unlock ruleset</span> ruleset and click the <span class="c1">PageID</span>In the XYZ pane, select the </li>
      

      Thoughts?

        To answer my own question, using reverse seems to fix the problem:

        for my $p ($elt->children('p')) { for my $c (reverse($p->cut_children())) { $c->paste($elt); } $p->delete(); }

        I don't understand why it's needed, but it works.

Re: XML::Twig removing tags from content
by Jenda (Abbot) on Sep 26, 2011 at 18:32 UTC
    use strict; use XML::Rules; my $parser = XML::Rules->new(style => 'filter', rules => { _default => 'raw', p => sub { return $_[1]->{_content}}, }); $parser->filter(\*DATA); __DATA__ <li> <p> Some <b>text</b> </p> </li>

    If you wanted to remove only <p> directly inside <li>, the code would look like this:

    use strict; use XML::Rules; my $parser = XML::Rules->new(style => 'filter', rules => { _default => 'raw', p => sub { if ($_[2][-1] eq 'li') { return $_[1]->{_content}; } else { return $_[0] => $_[1]; } }, }); $parser->filter(\*DATA); __DATA__ <root> <li> <p> Some <b>text</b> </p> </li> <p> Other <b>text</b> </p> </root>

    The $_[2] is a reference to an array containing the names of the opened tags, so the condition just checks whether the enclosing tag of the currently processed <p> is <li>.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

Re: XML::Twig removing tags from content
by mirod (Canon) on Sep 27, 2011 at 07:49 UTC

    First, a few comments on your code: delete completely deletes the element, including its descendants. So you should do this after moving the children. Then you cannot paste an element that's already part of a tree. You need to cut it, then paste it. Or in short, to move it. move, like paste, needs a position and a referent as arguments: $c->move( before => $p) would work.

    Also, I am not sure of the test on $p->parent('li'): it will try to find an li ancestor to $p, which may, possibly, find false positives (a p buried in a table within a li), so you probably want $p->parent->tag eq 'li', or $p->parent->is( 'li')).

    That said, try $p->erase if( $p->parent->is( 'li')), that might just do what you want.

      Thanks -- yeah I noticed that 'parent' and 'ancestor' seem to be the same thing -- thanks for clarifying how to eliminate the false positives. Scott
Re: XML::Twig removing tags from content
by choroba (Canon) on Sep 27, 2011 at 13:37 UTC
    I usually use XML::XSH2 for XML manipulation. For a task like yours, I'd use something like
    for //li/* mv text() replace . ;
    Update: Ouch, this works better:
    for //li/* { xmove (text()|*) after . ; rm . ; }
    or, even shorter:
    for //li/* xmove (text()|*) replace . ;
Re: XML::Twig removing tags from content
by Lotus1 (Chaplain) on Sep 27, 2011 at 16:56 UTC

    Here is one way to do this with XML::LibXML.

    #!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $xmlfile = 'test.xml'; my $parser = XML::LibXML->new(); my $doc = $parser->parse_file($xmlfile); my ($li_node) = $doc->findnodes("//li"); my ($p_element) = $doc->findnodes("//li/p"); foreach ($p_element->childNodes) { $li_node->appendChild( $_ ); } $li_node->removeChild( $p_element ); print $doc->toString;

    Contents of test.xml:

    <?xml version="1.0" encoding="utf-8"?> <li> <p> Some <b>text<c1> here</c1></b> </p> </li>

    Result:

    <?xml version="1.0" encoding="utf-8"?> <li> Some <b>text<c1> here</c1></b> </li>

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://927894]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (3)
As of 2015-07-04 17:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls