Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re: HTML Parser suggestions

by tobyink (Abbot)
on Jan 11, 2013 at 21:27 UTC ( #1012982=note: print w/ replies, xml ) Need Help??


in reply to HTML Parser suggestions

I'm biased, but I'll suggest HTML::HTML5::Parser. It uses the HTML5 parsing algorithm, so if faced with messy tag soup HTML, should very closely match how most desktop browsers parse HTML.

Quick example:

use 5.010; use strict; use warnings; use HTML::HTML5::Parser; use XML::LibXML::QuerySelector; my @elements = HTML::HTML5::Parser:: -> load_html(location => "http://www.perlmonks.org/?node_id=101298 +0") -> querySelectorAll("title"); say for @elements;
perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'


Comment on Re: HTML Parser suggestions
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1012982]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2015-07-30 11:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (271 votes), past polls