Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: Scrappy user_agent error

by docster (Novice)
on Jan 03, 2012 at 16:54 UTC ( #946094=note: print w/ replies, xml ) Need Help??


in reply to Re: Scrappy user_agent error
in thread Scrappy user_agent error

Some sites I visit automatically block robots. I was under the impression that this would change the default "Browser" id... Am I wrong?

From CPAN: The user_agent attribute holds the Scrappy::Scraper::UserAg +ent object which is used to set and manipulate the user-agent header +of the scraper. use Scrappy qw/:syntax/; user_agent random_ua; or user_agent random_ua 'firefox'; # firefox only user_agent random_ua 'firefox', 'linux'; # firefox on linux only


Comment on Re^2: Scrappy user_agent error
Download Code
Re^3: Scrappy user_agent error
by jethro (Monsignor) on Jan 03, 2012 at 17:51 UTC

    Could it be that you are using a recent scrappy (0.9xxx) but reading the documentation for an older version (like 0.6xxx)? I could find code like "qw/:syntax/" only in older documentation and in example scripts on the web (with a quick google search)

      Yes, it is entirely possible. I took the code above from the authors blog post. It is hard to find examples of a working Scrappy script. But as of now I am using: Scrappy (0.94112090).

      And CPANs Module Version: 0.94112090 docs from: http://search.cpan.org/dist/Scrappy/lib/Scrappy.pm#user_agent

      user_agent The user_agent attribute holds the Scrappy::Scraper::UserAgent object which is used to set and manipulate the user-agent header of the scraper.
      my $scraper = Scrappy->new; $scraper->user_agent;
      So in that context, how would I set the user_agent correctly to be firefox using Scrappy 0.94112090? There used to be way. Maybe it was removed. I seem to be missing the entire picture somehow :)

        docster:

        I've not used Scrappy, but perhaps you could check out the tests for examples on how to change the user agent name.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re^3: Scrappy user_agent error
by GrandFather (Cardinal) on Jan 03, 2012 at 23:43 UTC

    Have you checked (in their terms of use) that the sites that "automatically block robots" allow scraping? It would be pretty unusual to block robots and allow scraping!

    True laziness is hard work

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://946094]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (7)
As of 2014-10-23 02:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (123 votes), past polls