Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Seeking a more robust variant of HTML::TreeBuilder::XPath

by hippo (Bishop)
on Jul 12, 2019 at 12:33 UTC ( [id://11102726]=note: print w/replies, xml ) Need Help??


in reply to Seeking a more robust variant of HTML::TreeBuilder::XPath

Using this component, sometimes happens I'm limited to fully parse some more complex or broken pages.

Those are 2 quite separate problems. If the pages are merely complex then HTML::TreeBuilder::XPath should parse them. You would be helping everyone, yourself included, if you could report such bugs to the maintainer (ideally with an SSCCE) so that they can be fixed. Try to ensure that you are posting the bug against the right dist - it may be that one of the dependencies is actually at fault.

If the pages are broken then it's quite fair for HTML::TreeBuilder::XPath to fail to parse them. Instead you need a way to fix the page before parsing. Have you tried HTML::Valid?

  • Comment on Re: Seeking a more robust variant of HTML::TreeBuilder::XPath

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11102726]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-19 22:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found