Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Web scraping toolkit?

by mzedeler (Pilgrim)
on Jan 27, 2012 at 08:44 UTC ( #950285=note: print w/ replies, xml ) Need Help??


in reply to Re: Web scraping toolkit?
in thread Web scraping toolkit?

I think that App::scrape may turn out to be insufficient, not covering some edge cases that needs handling. But again - thats my general worry, not having tried any of the scraping modules yet (the same goes for Web::Scraper and Scrappy).

WWW::Mechanize::Firefox looks very promising, and implementing the few extra features that Scrapie has (logging and such) shouldn't be a problem. The real drawback lies in having to rely on firefox (or some similar component) in development and production.

I'll go back to the drawing board and see what to do. Thanks for the pointers.


Comment on Re^2: Web scraping toolkit?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://950285]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (9)
As of 2014-12-18 05:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (42 votes), past polls