in reply to
Scrappy user_agent error
Hello all. I am attempting to write a web crawler in Scrappy.
Well, there's your problem!
Even you say It is hard to find examples of a working Scrappy script -- there is a good reason for that, scrappy is too much pee :)
I would not recommend scrappy but Web::Scraper
extracting data from html using xpath
extract data from html with xpath
extract a substring between two emements, its css/xpath time again
creating a web crawler with Mechanize
Super Search for Mechanize/Scripter examples
See Re^5: can't get WWW::Mechanize to sign in on JustAnswer, Re: Mimicking Internet Explorer (IE) via LWP or Mechanize?
Are there any memory-efficient web scrapers?, Get 10,000 web pages fast, Async DNS with LWP