Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re^4: XML::LibXML::XPathContext->string_value - should ALL of the descendant's text be there?

by bobn (Chaplain)
on Aug 10, 2020 at 06:01 UTC ( [id://11120533]=note: print w/replies, xml ) Need Help??


in reply to Re^3: XML::LibXML::XPathContext->string_value - should ALL of the descendant's text be there?
in thread XML::LibXML::XPathContext->string_value - should ALL of the descendant's text be there?

$node->string_value();
Note this method is undocumented (there's a method with that name in XML::LibXML::NodeList, but your $nodes are XML::LibXML::Elements), you should use textContent instead.
Yes, that's true. I can't reconstruct exactly what happened when I made this code, I got into the documentation for an apparently unrelated module, where string_value was documented. I'm tempted to erase the whole thing.

However, this code of yours: my @texts = map { $_->data } node->findnodes('./text()');

actually shows *exactly* what I'm talking about: the "innermost_text" is ONLY appearing in the output for it's innermost containing element, which is the last <div> element/node/whatever that you found with $doc->findnodes('//*'). It's not in every element that it is inside of, like <body> or <html> That's what I was looking for! Thank you!!!

What I was working on: I've been doing some Python XHTML parsing, and over there, it was talking about "tail text". It's really weird - it says that text that follows an element's closing tag belongs to *that* element as "tail text" - NOT to the element that it is inside of. If you care, go to https://lxml.de/tutorial.html and search on "document-style". Anyhow, I was testing in Perl to see if it had anything like that, which I don't see.

As far as using SAX parsers, I've used somewhat similar - HTML:: Parser or XML::Parser are similar, I think, you create callbacks for events that happen during parsing. Having discovered XPath, the event-driven parser now seems to me like a crude, primitive approach. I'm sure there are still places it applies.

--Bob Niederman,

All code given here is UNTESTED unless otherwise stated.

  • Comment on Re^4: XML::LibXML::XPathContext->string_value - should ALL of the descendant's text be there?
  • Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11120533]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (2)
As of 2025-12-09 02:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What's your view on AI coding assistants?





    Results (88 votes). Check out past polls.

    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.