mzedeler has asked for the wisdom of the Perl Monks concerning the following question:
Hi fellow perl monks.
I need to organize the development of some 50+ small web scrapers for a similar number of pages on the Internet. The scrapers parse and extract data of similar structure across the different data sources.
So far, a few scripts has been written using WWW::Mechanize, HTML::TreeBuilder::XPath or HTML::TokeParser. This has worked fairly well, but I can see that there is a lot of boilerplate code across the scripts that could be reused. Also, I know that in some respect, we need a toolkit that doesn't give us too many ways to solve the same problem, so we can somewhat standardize the code.
I took a look at Scrappy, but the fact that it uses Web::Scraper, which in turn seems to be only partly documented has somewhat put me off.
Does anyone have any recommendations wrt. good web scraping toolkits?
Regards,
Michael.
I need to organize the development of some 50+ small web scrapers for a similar number of pages on the Internet. The scrapers parse and extract data of similar structure across the different data sources.
So far, a few scripts has been written using WWW::Mechanize, HTML::TreeBuilder::XPath or HTML::TokeParser. This has worked fairly well, but I can see that there is a lot of boilerplate code across the scripts that could be reused. Also, I know that in some respect, we need a toolkit that doesn't give us too many ways to solve the same problem, so we can somewhat standardize the code.
I took a look at Scrappy, but the fact that it uses Web::Scraper, which in turn seems to be only partly documented has somewhat put me off.
Does anyone have any recommendations wrt. good web scraping toolkits?
Regards,
Michael.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Web scraping toolkit?
by Corion (Patriarch) on Jan 26, 2012 at 16:03 UTC | |
by mzedeler (Pilgrim) on Jan 27, 2012 at 08:44 UTC | |
Re: Web scraping toolkit?
by Anonymous Monk on Jan 26, 2012 at 15:44 UTC | |
by mzedeler (Pilgrim) on Jan 27, 2012 at 08:38 UTC |
Back to
Seekers of Perl Wisdom