http://www.perlmonks.org?node_id=1012980

spatterson has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks, I'm returning to perl after a long absence with a need to parse some autogenerated HTML - in a tree based fashion & searching for specific class attributes on tags.

As there are loads of HTML parsing modules, which ones do fellow monks suggest?

Replies are listed 'Best First'.
Re: HTML Parser suggestions
by tobyink (Canon) on Jan 11, 2013 at 21:27 UTC

    I'm biased, but I'll suggest HTML::HTML5::Parser. It uses the HTML5 parsing algorithm, so if faced with messy tag soup HTML, should very closely match how most desktop browsers parse HTML.

    Quick example:

    use 5.010; use strict; use warnings; use HTML::HTML5::Parser; use XML::LibXML::QuerySelector; my @elements = HTML::HTML5::Parser:: -> load_html(location => "http://www.perlmonks.org/?node_id=101298 +0") -> querySelectorAll("title"); say for @elements;
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
Re: HTML Parser suggestions
by moritz (Cardinal) on Jan 11, 2013 at 21:44 UTC
Re: HTML Parser suggestions
by LanX (Saint) on Jan 11, 2013 at 21:28 UTC
    parsing HTML is a frequently asked topic and I suppose it can't be answered w/o more details about your specific problems.

    searching the monastery shows plenty of discussions, maybe you wanna dig in and ask again?

    EDIT: A quick look seems to suggest that HTML::TreeBuilder is popular.

    Cheers Rolf

Re: HTML Parser suggestions
by blue_cowdawg (Monsignor) on Jan 11, 2013 at 21:22 UTC
        As there are loads of HTML parsing modules, which ones do fellow monks suggest?

    I've used HTML::Parser a few times.


    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
Re: HTML Parser suggestions
by Anonymous Monk on Jan 12, 2013 at 02:58 UTC