Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: Scrappy user_agent error

by marto (Bishop)
on Jan 03, 2012 at 16:43 UTC ( #946093=note: print w/replies, xml ) Need Help??

in reply to Scrappy user_agent error

Where are you copying and pasting your code from? What do you expect user_agent random_ua; to do?

Replies are listed 'Best First'.
Re^2: Scrappy user_agent error
by docster (Novice) on Jan 03, 2012 at 16:54 UTC
    Some sites I visit automatically block robots. I was under the impression that this would change the default "Browser" id... Am I wrong?
    From CPAN: The user_agent attribute holds the Scrappy::Scraper::UserAg +ent object which is used to set and manipulate the user-agent header +of the scraper. use Scrappy qw/:syntax/; user_agent random_ua; or user_agent random_ua 'firefox'; # firefox only user_agent random_ua 'firefox', 'linux'; # firefox on linux only

      Could it be that you are using a recent scrappy (0.9xxx) but reading the documentation for an older version (like 0.6xxx)? I could find code like "qw/:syntax/" only in older documentation and in example scripts on the web (with a quick google search)

        Yes, it is entirely possible. I took the code above from the authors blog post. It is hard to find examples of a working Scrappy script. But as of now I am using: Scrappy (0.94112090).

        And CPANs Module Version: 0.94112090 docs from:

        user_agent The user_agent attribute holds the Scrappy::Scraper::UserAgent object which is used to set and manipulate the user-agent header of the scraper.
        my $scraper = Scrappy->new; $scraper->user_agent;
        So in that context, how would I set the user_agent correctly to be firefox using Scrappy 0.94112090? There used to be way. Maybe it was removed. I seem to be missing the entire picture somehow :)

      Have you checked (in their terms of use) that the sites that "automatically block robots" allow scraping? It would be pretty unusual to block robots and allow scraping!

      True laziness is hard work

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://946093]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2018-01-20 19:14 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (227 votes). Check out past polls.