Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: Scrappy user_agent error

by Anonymous Monk
on Jan 04, 2012 at 00:12 UTC ( #946170=note: print w/ replies, xml ) Need Help??

in reply to Scrappy user_agent error

Hello all. I am attempting to write a web crawler in Scrappy.

Well, there's your problem!

Even you say It is hard to find examples of a working Scrappy script -- there is a good reason for that, scrappy is too much pee :)

I would not recommend scrappy but Web::Scraper

See WWW::Mechanize, subclass WWW::Scripter::Plugin::JavaScript, WWW::Mechanize::Firefox, Web::Scraper, App::scrape

extracting data from html using xpath

extract data from html with xpath

extract a substring between two emements, its css/xpath time again

creating a web crawler with Mechanize

Super Search for Mechanize/Scripter examples

See Re^5: can't get WWW::Mechanize to sign in on JustAnswer, Re: Mimicking Internet Explorer (IE) via LWP or Mechanize?

Are there any memory-efficient web scrapers?, Get 10,000 web pages fast, Async DNS with LWP

Comment on Re: Scrappy user_agent error

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://946170]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (4)
As of 2016-06-01 02:56 GMT
Find Nodes?
    Voting Booth?