Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Re^3: LWP::UserAgent Bad and Forbidden requests

by 1arryb (Acolyte)
on Dec 15, 2011 at 19:54 UTC ( #943821=note: print w/replies, xml ) Need Help??

in reply to Re^2: LWP::UserAgent Bad and Forbidden requests
in thread LWP::UserAgent Bad and Forbidden requests

Hi Corion,

True, but...all LWP::RobotUA gets you is a) client side processing of robot rules (i.e., once the user agent has downloaded robots.txt for a site, it will abort a banned url before making the request; and b) an optional, configurable delay between requests so your program can be a good "netizen" and avoid hammering websites too hard. None of this prevents the web server from evaluating your user agent identification string and processing its robot rules to accept or reject your request.



  • Comment on Re^3: LWP::UserAgent Bad and Forbidden requests

Replies are listed 'Best First'.
Re^4: LWP::UserAgent Bad and Forbidden requests
by Corion (Pope) on Dec 16, 2011 at 07:26 UTC
    A webserver in general does not care about robots.txt and does not enforce any of the rules in it. User agent rejection needs to be configured separately for the webserver.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://943821]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (2)
As of 2018-08-15 01:35 GMT
Find Nodes?
    Voting Booth?
    Asked to put a square peg in a round hole, I would:

    Results (158 votes). Check out past polls.