Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Scrappy user_agent error

by marto (Chancellor)
on Jan 03, 2012 at 16:43 UTC ( #946093=note: print w/ replies, xml ) Need Help??


in reply to Scrappy user_agent error

Where are you copying and pasting your code from? What do you expect user_agent random_ua; to do?


Comment on Re: Scrappy user_agent error
Download Code
Re^2: Scrappy user_agent error
by docster (Novice) on Jan 03, 2012 at 16:54 UTC
    Some sites I visit automatically block robots. I was under the impression that this would change the default "Browser" id... Am I wrong?
    From CPAN: The user_agent attribute holds the Scrappy::Scraper::UserAg +ent object which is used to set and manipulate the user-agent header +of the scraper. use Scrappy qw/:syntax/; user_agent random_ua; or user_agent random_ua 'firefox'; # firefox only user_agent random_ua 'firefox', 'linux'; # firefox on linux only

      Could it be that you are using a recent scrappy (0.9xxx) but reading the documentation for an older version (like 0.6xxx)? I could find code like "qw/:syntax/" only in older documentation and in example scripts on the web (with a quick google search)

        Yes, it is entirely possible. I took the code above from the authors blog post. It is hard to find examples of a working Scrappy script. But as of now I am using: Scrappy (0.94112090).

        And CPANs Module Version: 0.94112090 docs from: http://search.cpan.org/dist/Scrappy/lib/Scrappy.pm#user_agent

        user_agent The user_agent attribute holds the Scrappy::Scraper::UserAgent object which is used to set and manipulate the user-agent header of the scraper.
        my $scraper = Scrappy->new; $scraper->user_agent;
        So in that context, how would I set the user_agent correctly to be firefox using Scrappy 0.94112090? There used to be way. Maybe it was removed. I seem to be missing the entire picture somehow :)

      Have you checked (in their terms of use) that the sites that "automatically block robots" allow scraping? It would be pretty unusual to block robots and allow scraping!

      True laziness is hard work

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://946093]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (8)
As of 2014-09-22 07:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (182 votes), past polls