Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: blocking site scrapers

by pboin (Deacon)
on Feb 07, 2006 at 14:07 UTC ( [id://528519]=note: print w/replies, xml ) Need Help??


in reply to blocking site scrapers

There's a lot of things to think about here, as other monks have well-noted. One thing I could add for you to think about: You could also inadvertently block a router that's doing NAT for an entire organization. Everyone behind that router would appear to come from the same IP address in your logs. You may end up deciding that blocking a whole organization is OK, but at least consider what you're dealing with.

One of the more clever ways to stop robots, IMO is to have a tarpit link / picture that triggers a penalty period. Bots are dumb, and they'll fall for it every time, unless a human codes around your particular tarpit.

My favorite example is the tarpit for SQLite on their wiki.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://528519]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-19 05:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found