Web scraping toolkit?by mzedeler (Pilgrim)
|on Jan 26, 2012 at 15:41 UTC||Need Help??|
mzedeler has asked for the
wisdom of the Perl Monks concerning the following question:
Hi fellow perl monks.
I need to organize the development of some 50+ small web scrapers for a similar number of pages on the Internet. The scrapers parse and extract data of similar structure across the different data sources.
So far, a few scripts has been written using WWW::Mechanize, HTML::TreeBuilder::XPath or HTML::TokeParser. This has worked fairly well, but I can see that there is a lot of boilerplate code across the scripts that could be reused. Also, I know that in some respect, we need a toolkit that doesn't give us too many ways to solve the same problem, so we can somewhat standardize the code.
I took a look at Scrappy, but the fact that it uses Web::Scraper, which in turn seems to be only partly documented has somewhat put me off.
Does anyone have any recommendations wrt. good web scraping toolkits?