Dear fellow monks,
I have been looking into the various web scraping frameworks in Perl, and gathered the following ones bit by bit from various Perlmonks discussions and blog posts. I'm listing them here for posterity, but my main purpose is to get the community's feedback on the current status of these and the best way to go about web spidering in modern Perl.
(Note that the comments are quick first impressions and maybe wildly inaccurate, corrections welcome)
- Good old WWW::Mechanize and HTML::TreeBuilder (Mojo::UserAgent and Mojo::DOM seem to be basically equivalent, I haven't tried them).
- Comments: Gets the job done, gives you full control (edit the HTML before parsing if you want, get HTML dumps easily, etc.), but the code ends up quite verbose and boilerplated.
- YADA - just came across it, haven't used it yet.
- Comments: This is what I'm using now, the DSL syntax is nice though a bit under-documented, and I had to peek into the sources quite a bit to either understand or customize many things.
The main reason I'm making this post is that I seem to be stumbling upon good scraping frameworks randomly, so it's quite possible I'm missing some really good framework that Google just hasn't divined to show me. So, I'd like to get the opinion of revered monks on this topic.