Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

XML::Twig removing tags from content

by slugger415 (Monk)
on Sep 26, 2011 at 16:06 UTC ( [id://927894]=perlquestion: print w/replies, xml ) Need Help??

slugger415 has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I've been looking through the Monks site and other resources for answers but haven't come up with anything. (xmltwig.com seems to have disappeared.)

I'm looking for a way to delete an XML tag without deleting its contents. For example, I have this:

<li>
  <p>
   Some <b>text</b>
  </p>
</li>

And I want it to be just:

<li>
   Some <b>text</b>
</li>
I tried this but I get an error ("cannot paste an element that belongs to a tree"):
  if($p->parent('li')){
   my(@children) = $p->children;
   $p->delete;
   foreach my $c (@children){
    $c->paste;
   }  
  }

Forgive me as I'm a newbie at Twig and am still trying to figure out how it works.

Many thanks, Scott

Replies are listed 'Best First'.
Re: XML::Twig removing tags from content (XML::Twig)
by toolic (Bishop) on Sep 26, 2011 at 17:07 UTC
    cut_children removes the 'p' tags and gets you closer...
    use warnings; use strict; use XML::Twig; my $str = <<EOF; <li> <p> Some <b>text</b> </p> </li> EOF my $t = XML::Twig->new( twig_handlers => {li => \&li}, pretty_print => 'indented', ); $t->parse($str); $t->print(); sub li { my ($t, $elt) = @_; for my $p ($elt->children('p')) { for my $c ($p->cut_children()) { $c->paste($elt); } $p->delete(); } } __END__ <li><b>text</b> Some </li>

      Hi Canon, getting closer, but for some reason the children are coming out in the wrong order (reversed?):

      Original:

      <li class="c2">
      	<p class="Number1">In the XYZ pane, select the <span class="c1">PageID</span> ruleset and click the <span class="c1">Lock/Unlock ruleset</span> button. Then expand the
      	<span class="c1">PageID</span> ruleset to view the two rules.</p>
      </li>
      
      

      Result:

            <li class="c2"> ruleset to view the two rules.<span class="c1">PageID</span> button. Then expand the
                  <span class="c1">Lock/Unlock ruleset</span> ruleset and click the <span class="c1">PageID</span>In the XYZ pane, select the </li>
      

      Thoughts?

        To answer my own question, using reverse seems to fix the problem:

        for my $p ($elt->children('p')) { for my $c (reverse($p->cut_children())) { $c->paste($elt); } $p->delete(); }

        I don't understand why it's needed, but it works.

Re: XML::Twig removing tags from content
by Jenda (Abbot) on Sep 26, 2011 at 18:32 UTC
    use strict; use XML::Rules; my $parser = XML::Rules->new(style => 'filter', rules => { _default => 'raw', p => sub { return $_[1]->{_content}}, }); $parser->filter(\*DATA); __DATA__ <li> <p> Some <b>text</b> </p> </li>

    If you wanted to remove only <p> directly inside <li>, the code would look like this:

    use strict; use XML::Rules; my $parser = XML::Rules->new(style => 'filter', rules => { _default => 'raw', p => sub { if ($_[2][-1] eq 'li') { return $_[1]->{_content}; } else { return $_[0] => $_[1]; } }, }); $parser->filter(\*DATA); __DATA__ <root> <li> <p> Some <b>text</b> </p> </li> <p> Other <b>text</b> </p> </root>

    The $_[2] is a reference to an array containing the names of the opened tags, so the condition just checks whether the enclosing tag of the currently processed <p> is <li>.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

Re: XML::Twig removing tags from content
by mirod (Canon) on Sep 27, 2011 at 07:49 UTC

    First, a few comments on your code: delete completely deletes the element, including its descendants. So you should do this after moving the children. Then you cannot paste an element that's already part of a tree. You need to cut it, then paste it. Or in short, to move it. move, like paste, needs a position and a referent as arguments: $c->move( before => $p) would work.

    Also, I am not sure of the test on $p->parent('li'): it will try to find an li ancestor to $p, which may, possibly, find false positives (a p buried in a table within a li), so you probably want $p->parent->tag eq 'li', or $p->parent->is( 'li')).

    That said, try $p->erase if( $p->parent->is( 'li')), that might just do what you want.

      Thanks -- yeah I noticed that 'parent' and 'ancestor' seem to be the same thing -- thanks for clarifying how to eliminate the false positives. Scott
Re: XML::Twig removing tags from content
by choroba (Cardinal) on Sep 27, 2011 at 13:37 UTC
    I usually use XML::XSH2 for XML manipulation. For a task like yours, I'd use something like
    for //li/* mv text() replace . ;
    Update: Ouch, this works better:
    for //li/* { xmove (text()|*) after . ; rm . ; }
    or, even shorter:
    for //li/* xmove (text()|*) replace . ;
Re: XML::Twig removing tags from content
by Lotus1 (Vicar) on Sep 27, 2011 at 16:56 UTC

    Here is one way to do this with XML::LibXML.

    #!/usr/bin/perl use warnings; use strict; use XML::LibXML; my $xmlfile = 'test.xml'; my $parser = XML::LibXML->new(); my $doc = $parser->parse_file($xmlfile); my ($li_node) = $doc->findnodes("//li"); my ($p_element) = $doc->findnodes("//li/p"); foreach ($p_element->childNodes) { $li_node->appendChild( $_ ); } $li_node->removeChild( $p_element ); print $doc->toString;

    Contents of test.xml:

    <?xml version="1.0" encoding="utf-8"?> <li> <p> Some <b>text<c1> here</c1></b> </p> </li>

    Result:

    <?xml version="1.0" encoding="utf-8"?> <li> Some <b>text<c1> here</c1></b> </li>

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://927894]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-25 13:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found