Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re: Scrappy user_agent error

by Anonymous Monk
on Jan 04, 2012 at 00:12 UTC ( #946170=note: print w/ replies, xml ) Need Help??


in reply to Scrappy user_agent error

Hello all. I am attempting to write a web crawler in Scrappy.

Well, there's your problem!

Even you say It is hard to find examples of a working Scrappy script -- there is a good reason for that, scrappy is too much pee :)

I would not recommend scrappy but Web::Scraper

See WWW::Mechanize, subclass WWW::Scripter::Plugin::JavaScript, WWW::Mechanize::Firefox, Web::Scraper, App::scrape

extracting data from html using xpath

extract data from html with xpath

extract a substring between two emements, its css/xpath time again

creating a web crawler with Mechanize

Super Search for Mechanize/Scripter examples

See Re^5: can't get WWW::Mechanize to sign in on JustAnswer, Re: Mimicking Internet Explorer (IE) via LWP or Mechanize?

Are there any memory-efficient web scrapers?, Get 10,000 web pages fast, Async DNS with LWP

Comment on Re: Scrappy user_agent error

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://946170]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2016-05-06 04:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?