http://www.perlmonks.org?node_id=946094


in reply to Re: Scrappy user_agent error
in thread Scrappy user_agent error

Some sites I visit automatically block robots. I was under the impression that this would change the default "Browser" id... Am I wrong?
From CPAN: The user_agent attribute holds the Scrappy::Scraper::UserAg +ent object which is used to set and manipulate the user-agent header +of the scraper. use Scrappy qw/:syntax/; user_agent random_ua; or user_agent random_ua 'firefox'; # firefox only user_agent random_ua 'firefox', 'linux'; # firefox on linux only

Replies are listed 'Best First'.
Re^3: Scrappy user_agent error
by jethro (Monsignor) on Jan 03, 2012 at 17:51 UTC

    Could it be that you are using a recent scrappy (0.9xxx) but reading the documentation for an older version (like 0.6xxx)? I could find code like "qw/:syntax/" only in older documentation and in example scripts on the web (with a quick google search)

      Yes, it is entirely possible. I took the code above from the authors blog post. It is hard to find examples of a working Scrappy script. But as of now I am using: Scrappy (0.94112090).

      And CPANs Module Version: 0.94112090 docs from: http://search.cpan.org/dist/Scrappy/lib/Scrappy.pm#user_agent

      user_agent The user_agent attribute holds the Scrappy::Scraper::UserAgent object which is used to set and manipulate the user-agent header of the scraper.
      my $scraper = Scrappy->new; $scraper->user_agent;
      So in that context, how would I set the user_agent correctly to be firefox using Scrappy 0.94112090? There used to be way. Maybe it was removed. I seem to be missing the entire picture somehow :)

        docster:

        I've not used Scrappy, but perhaps you could check out the tests for examples on how to change the user agent name.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re^3: Scrappy user_agent error
by GrandFather (Saint) on Jan 03, 2012 at 23:43 UTC

    Have you checked (in their terms of use) that the sites that "automatically block robots" allow scraping? It would be pretty unusual to block robots and allow scraping!

    True laziness is hard work