Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: HTML::Tree(Builder) in 6 minutes

by Anonymous Monk
on Nov 30, 2004 at 19:16 UTC ( #411251=note: print w/ replies, xml ) Need Help??


in reply to •Re: HTML::Tree(Builder) in 6 minutes
in thread HTML::Tree(Builder) in 6 minutes

XML::LibXML is very fast, but it can barely parse 1% of the web pages one can find on the Internet because it expects too strict HTML. That's why your 8-lines Perl program at the end of your column doesn't work. Tree::Builder is very slow and does not provide DOM nor XPath. I think that there is nothing in Perl that can parse real web pages while beeing fast and giving access to DOM or XPath. fred


Comment on Re: HTML::Tree(Builder) in 6 minutes
Re^2: HTML::Tree(Builder) in 6 minutes
by mirod (Canon) on Nov 07, 2009 at 07:53 UTC

    A little late to the party... but for future reference, HTML::TreeBuilder::XPath gives you XPath on an HTML::Tree object.

    And I agree with XML::LibXML not being great at dealing with "real" HTML.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://411251]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (6)
As of 2015-07-05 18:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (67 votes), past polls