Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Web Robot

by Abigail-II (Bishop)
on Jul 16, 2003 at 22:22 UTC ( #275054=note: print w/replies, xml ) Need Help??


in reply to Web Robot

For me, there's just one "don't":
  • Don't piss off the owners of the site.

From that rule, many others can be deduced:

  • Obey robots.txt.
  • Don't flood a site.
  • Don't republish, especially not anything that might be copyrighted.
  • Be very conservative when visiting sites that maintain themselves by showing ads. Anytime you fetch something without fetching the ad(s), it costs them money, without any gain for them.
  • For anything you need to register for, don't do anything that conflicts with their terms of service.

Remember that your robot will be a guest in other peoples territories. Act accordingly.

Abigail

Replies are listed 'Best First'.
Re: Re: Web Robot
by Anonymous Monk on Jul 17, 2003 at 01:58 UTC

    You'll note that most major search engines violate the majority, if not all, of these rules. So don't take them too seriously.

    Don't let that stop you from playing nice though :)

      Most major search engines don't stick to those rules, true. That means that if you use a robots.txt on your site, be aware of that.

      But I think, just because some big companies/search engines don't stick to the rules doesn't mean that you should do the same. I always go by the maxime, don't do unto someone else, what you wouldn't want done to you/your site.

      Just my 2 Rappen (Swiss equivalent to cents).

      --cs

      There are nights when the wolves are silent and only the moon howls. - George Carlin

        On the topic of robots.txt, why would someone even use this? If you don't want a page accessed, limit access to it. Depending on all computers to play nice isn't a very smart move, they have many hidden motives :)

      The question is ... does this really matter? I mean if your pages are indexed they are more likely to be found, therefore you get more hits, more ad views and in the end more money. So I would not care that much if a search engine flooded my server with requests once a month.

      So IMHO the only violation that might matter is not obeying robots.txt. Actually could someone give me some example of a reasonable robots.txt usage?

      Jenda
      Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
         -- Rick Osborne

      Edit by castaway: Closed small tag in signature

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://275054]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (7)
As of 2020-04-05 11:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The most amusing oxymoron is:
















    Results (34 votes). Check out past polls.

    Notices?