Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re^2: LWP::UserAgent Bad and Forbidden requests

by Corion (Pope)
on Dec 15, 2011 at 19:30 UTC ( #943816=note: print w/replies, xml ) Need Help??

in reply to Re: LWP::UserAgent Bad and Forbidden requests
in thread LWP::UserAgent Bad and Forbidden requests

LWP::UserAgent does not respect robots.txt. LWP::RobotUA does.

Replies are listed 'Best First'.
Re^3: LWP::UserAgent Bad and Forbidden requests
by 1arryb (Acolyte) on Dec 15, 2011 at 19:54 UTC

    Hi Corion,

    True, but...all LWP::RobotUA gets you is a) client side processing of robot rules (i.e., once the user agent has downloaded robots.txt for a site, it will abort a banned url before making the request; and b) an optional, configurable delay between requests so your program can be a good "netizen" and avoid hammering websites too hard. None of this prevents the web server from evaluating your user agent identification string and processing its robot rules to accept or reject your request.



      A webserver in general does not care about robots.txt and does not enforce any of the rules in it. User agent rejection needs to be configured separately for the webserver.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://943816]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (16)
As of 2018-07-23 13:49 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (468 votes). Check out past polls.