|
|
| Come for the quick hacks, stay for the epiphanies. | |
| PerlMonks |
Re^4: XML::LibXML::XPathContext->string_value - should ALL of the descendant's text be there?by bobn (Chaplain) |
| on Aug 10, 2020 at 06:01 UTC ( [id://11120533]=note: print w/replies, xml ) | Need Help?? |
|
$node->string_value();Yes, that's true. I can't reconstruct exactly what happened when I made this code, I got into the documentation for an apparently unrelated module, where string_value was documented. I'm tempted to erase the whole thing. However, this code of yours: my @texts = map { $_->data } node->findnodes('./text()'); actually shows *exactly* what I'm talking about: the "innermost_text" is ONLY appearing in the output for it's innermost containing element, which is the last <div> element/node/whatever that you found with $doc->findnodes('//*'). It's not in every element that it is inside of, like <body> or <html> That's what I was looking for! Thank you!!! What I was working on: I've been doing some Python XHTML parsing, and over there, it was talking about "tail text". It's really weird - it says that text that follows an element's closing tag belongs to *that* element as "tail text" - NOT to the element that it is inside of. If you care, go to https://lxml.de/tutorial.html and search on "document-style". Anyhow, I was testing in Perl to see if it had anything like that, which I don't see. As far as using SAX parsers, I've used somewhat similar - HTML:: Parser or XML::Parser are similar, I think, you create callbacks for events that happen during parsing. Having discovered XPath, the event-driven parser now seems to me like a crude, primitive approach. I'm sure there are still places it applies. --Bob Niederman, All code given here is UNTESTED unless otherwise stated.
In Section
Seekers of Perl Wisdom
|
|
||||||||||||||||||||||||||||||