mldvx4 has asked for the wisdom of the Perl Monks concerning the following question:
Thanks for the previous.
I have a question about either HTML::TreeBuilder::XPath or HTML::Element, and the interaction between them. I would like to manipulate the content of an element while leaving all its children in place. I'm not able to find a way around that because it appears that replace_with() also automatically and unavoidably escapes the < and > signs. The example below uses ~literal but I've also tried creating a new element. Either way, the child elements within the selected element get escaped despite my best efforts. How would it be possible to do something like the following (using a different work flow if necessary) such that the tags for the child elements remain intact and unescaped?
#!/usr/bin/perl use HTML::TreeBuilder::XPath; use HTML::Element; use warnings; use strict; my $xhtml = HTML::TreeBuilder::XPath->new; $xhtml->implicit_tags(1); $xhtml->no_space_compacting(1); $xhtml->parse_file(\*DATA) or die("Could not parse file handle for 'DATA' : $!\n"); for my $item ($xhtml->findnodes('//div/ul/li')) { my $li = $item->as_XML; $li =~ s/^\s+//; # ... omitting rest of the stuff which happens to $li ... my $new = HTML::Element->new('~literal', 'text' => $li); $item->replace_with($new); } print $xhtml->as_XML_indented; $xhtml->delete; exit(0); __DATA__ <html> <head> <title>Foo Bar</title> </head> <body> <div><a href=" http://foo.example.com/ ">Foo Bar</a> <ul> <li> foo foo foo foo <em>bar</em> foo foo foo foo foo </li></ul></div> <div><a href=" http://bar.example.com/ ">Bar Foo</a> <ul> <li> foo foo foo foo <em>bar</em> foo foo foo foo foo <ul> <li>alpha</li> <li>b<em>et</em>a</li> <li>gamma</li> </ul> </li></ul></div> </body> </html>
The output I get is as follows:
<html> <head> <title>Foo Bar</title> </head> <body> <div><a href=" http://foo.example.com/ ">Foo Bar</a> <ul><li> foo foo foo foo <em>bar</em> foo foo foo foo foo </li> </ul> </div> <div><a href=" http://bar.example.com/ ">Bar Foo</a> <ul><li> foo foo foo foo <em>bar</em> foo foo foo foo foo <ul><li>alpha</li><li>b<em>et</e +m>a</li><li>gamma</li></ul></li> </ul> </div> </body> </html>
The output I would like to get instead would look like this:
<html> <head> <title>Foo Bar</title> </head> <body> <div><a href=" http://foo.example.com/ ">Foo Bar</a> <ul><li>foo foo foo foo <em>bar</em> foo foo foo foo foo </li> </ul>HTML::TreeBuilder::XPath </div> <div><a href=" http://bar.example.com/ ">Bar Foo</a> <ul><li>foo foo foo foo <em>bar</em> foo foo foo foo foo <ul><li>alpha</li><li>b<em>et</em>a</li><li>gamma</li></ul></li> </ul> </div> </body> </html>
I'm not sure if HTML::TreeBuilder::XPath can be made to work like that. If it can, what has to change?
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Avoiding escaped child elements with HTML::TreeBuilder::XPath or HTML::Element
by haukex (Archbishop) on Nov 15, 2021 at 10:39 UTC | |
Re: Avoiding escaped child elements with HTML::TreeBuilder::XPath or HTML::Element
by marto (Cardinal) on Nov 15, 2021 at 10:44 UTC | |
by mldvx4 (Friar) on Nov 15, 2021 at 12:39 UTC | |
by haukex (Archbishop) on Nov 15, 2021 at 12:50 UTC | |
Re: Avoiding escaped child elements with HTML::TreeBuilder::XPath or HTML::Element
by tangent (Parson) on Nov 15, 2021 at 14:19 UTC | |
Re: Avoiding escaped child elements with HTML::TreeBuilder::XPath or HTML::Element
by Anonymous Monk on Nov 15, 2021 at 12:45 UTC |