I have been screen scraping for a few years with WWW::Mechanize and HTML::TokeParser and they have served me well. However, I recently encountered a set of pages which use complex table structures and numerous tab divs. I need a module (or methodology) which will allow me to search for sections of HTML in a more jQuery find()-like manner rather than simply consuming tokens from a stream of HTML.
I read through the post The State of Web spidering in Perl and, while helpful, the focus is more on spidering than scraping. I am interested in recommendations from the Monks if there are higher-order methods of finding contructs in HTML using Perl besides regular expressions and token parsing.
I read Mahmoud's jquery module on CPAN with interest but it appears not to have been maintained since 2013 and and I am uncertain it can query on table structures. To be fair, jQuery is limited on querying unlabeled table structures as well.
Here is a small example of what I am trying to accomplish:
1) Find the 6th and 9th rows in a named table (given an id) and pull out the name and value pairs.
2) Slurp in every row in a named table and parse out the name value pairs.
Cheers!