Beefy Boxes and Bandwidth Generously Provided by pair Networks Cowboy Neal with Hat
Don't ask to ask, just ask
 
PerlMonks  

HTML Parser suggestions

by spatterson (Monk)
on Jan 11, 2013 at 21:20 UTC ( #1012980=perlquestion: print w/ replies, xml ) Need Help??
spatterson has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks, I'm returning to perl after a long absence with a need to parse some autogenerated HTML - in a tree based fashion & searching for specific class attributes on tags.

As there are loads of HTML parsing modules, which ones do fellow monks suggest?

Comment on HTML Parser suggestions
Re: HTML Parser suggestions
by blue_cowdawg (Prior) on Jan 11, 2013 at 21:22 UTC
        As there are loads of HTML parsing modules, which ones do fellow monks suggest?

    I've used HTML::Parser a few times.


    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
Re: HTML Parser suggestions
by tobyink (Abbot) on Jan 11, 2013 at 21:27 UTC

    I'm biased, but I'll suggest HTML::HTML5::Parser. It uses the HTML5 parsing algorithm, so if faced with messy tag soup HTML, should very closely match how most desktop browsers parse HTML.

    Quick example:

    use 5.010; use strict; use warnings; use HTML::HTML5::Parser; use XML::LibXML::QuerySelector; my @elements = HTML::HTML5::Parser:: -> load_html(location => "http://www.perlmonks.org/?node_id=101298 +0") -> querySelectorAll("title"); say for @elements;
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: HTML Parser suggestions
by LanX (Abbot) on Jan 11, 2013 at 21:28 UTC
    parsing HTML is a frequently asked topic and I suppose it can't be answered w/o more details about your specific problems.

    searching the monastery shows plenty of discussions, maybe you wanna dig in and ask again?

    EDIT: A quick look seems to suggest that HTML::TreeBuilder is popular.

    Cheers Rolf

Re: HTML Parser suggestions
by moritz (Cardinal) on Jan 11, 2013 at 21:44 UTC
Re: HTML Parser suggestions
by Anonymous Monk on Jan 12, 2013 at 02:58 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1012980]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (8)
As of 2014-04-17 07:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (440 votes), past polls