Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re^2: Scrappy user_agent error

by docster (Novice)
on Jan 03, 2012 at 16:54 UTC ( #946094=note: print w/replies, xml ) Need Help??

in reply to Re: Scrappy user_agent error
in thread Scrappy user_agent error

Some sites I visit automatically block robots. I was under the impression that this would change the default "Browser" id... Am I wrong?
From CPAN: The user_agent attribute holds the Scrappy::Scraper::UserAg +ent object which is used to set and manipulate the user-agent header +of the scraper. use Scrappy qw/:syntax/; user_agent random_ua; or user_agent random_ua 'firefox'; # firefox only user_agent random_ua 'firefox', 'linux'; # firefox on linux only

Replies are listed 'Best First'.
Re^3: Scrappy user_agent error
by jethro (Monsignor) on Jan 03, 2012 at 17:51 UTC

    Could it be that you are using a recent scrappy (0.9xxx) but reading the documentation for an older version (like 0.6xxx)? I could find code like "qw/:syntax/" only in older documentation and in example scripts on the web (with a quick google search)

      Yes, it is entirely possible. I took the code above from the authors blog post. It is hard to find examples of a working Scrappy script. But as of now I am using: Scrappy (0.94112090).

      And CPANs Module Version: 0.94112090 docs from:

      user_agent The user_agent attribute holds the Scrappy::Scraper::UserAgent object which is used to set and manipulate the user-agent header of the scraper.
      my $scraper = Scrappy->new; $scraper->user_agent;
      So in that context, how would I set the user_agent correctly to be firefox using Scrappy 0.94112090? There used to be way. Maybe it was removed. I seem to be missing the entire picture somehow :)


        I've not used Scrappy, but perhaps you could check out the tests for examples on how to change the user agent name.


        When your only tool is a hammer, all problems look like your thumb.

Re^3: Scrappy user_agent error
by GrandFather (Sage) on Jan 03, 2012 at 23:43 UTC

    Have you checked (in their terms of use) that the sites that "automatically block robots" allow scraping? It would be pretty unusual to block robots and allow scraping!

    True laziness is hard work

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://946094]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (6)
As of 2018-01-22 01:29 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (230 votes). Check out past polls.