Re: (OT) Robots disallow
by marto (Cardinal) on Apr 01, 2009 at 15:05 UTC
|
| [reply] |
Re: (OT) Robots disallow
by kennethk (Abbot) on Apr 01, 2009 at 15:03 UTC
|
You can't. There is no possible way for a server to tell the difference between a robot and a human user. The Robots_exclusion_standard is purely voluntary for robot scripters, and you have already followed the protocol. | [reply] |
|
If there's a specific signature of the bot that you're having problems with (a specific domain, or an identifiable user-agent), look into configuring your webserver to send them alternative content.
Back when it was easy to identify e-mail harvesters, I'd send them to a CGI that slowly feeds them bogus e-mail addresses and the abuse address from the netblock they're coming from. These days, most harvesters are coming from botnets, so the abuse one isn't so useful. (and yes, it _is_ a Perl script)
| [reply] |
Re: (OT) Robots disallow
by CountZero (Bishop) on Apr 01, 2009 at 15:06 UTC
|
The robots or crawlers are free to fully disregard the robots.txt directives. Certainly that is not nice, but the world is full of less than nice people (and robots and crawlers and ...)I would not care much of this. If you do not want to know the world about the info on your web-site, then don't publish it where everyone can see it or put it behind a password protection. Some more info can be found at The Web Robots pages.
CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James
| [reply] |
|
| [reply] |
Re: (OT) Robots disallow
by kyle (Abbot) on Apr 01, 2009 at 15:24 UTC
|
As others have noted, you're already doing the Right Thing to rid yourself of robots that play by the rules. You might be able to discourage badly behaved robots by creating a tar pit for it to wander into. The problems with this are many, and I'm not inclined to discuss them in any detail, but if you're in desperate times and looking for desperate measures, it's an idea worth consideration and probably rejection.
| [reply] |
Re: (OT) Robots disallow
by VinsWorldcom (Prior) on Apr 01, 2009 at 15:16 UTC
|
#!/usr/bin/perl
use strict;
open (OUT, ">robots.txt") || die "Cannot open robots.txt\n";
print OUT "User-agent: *\n";
print OUT "Disallow: /\n";
close OUT;
<sarcasm off> | [reply] [d/l] |