Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
Welcome to the Monastery
 
PerlMonks  

Re: perl regex or module that identifies bots/crawlers

by shigetsu (Hermit)
on Mar 20, 2007 at 19:18 UTC ( #605732=note: print w/ replies, xml ) Need Help??

Comment on Re: perl regex or module that identifies bots/crawlers
Re^2: perl regex or module that identifies bots/crawlers
by argv (Pilgrim) on Mar 20, 2007 at 21:56 UTC
    Perhaps HTTP::BrowserDetect's robot() method?
    While I retain my enthusiasm for this module, and while it does precisely what I wanted it to do -- namely, to have a simplified/generic series of regex's that can determine whether a browser is a robot -- it suffers from a problem that plagues all who venture into this area: it's impossible to keep up with the robots. I've found numerous databases of known robot names, and all of them stipulate that none of these lists are complete. It is an unsolvable problem, which is the primary reason for the crypt glyphs you see on pages (that make you type something to prove you're a human). That said, the robot() method does a good enough job for now, and certainly well worth not having had to spend more time dealing with this problem. Great bang for the buck. perlmonks rescued me once again...

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://605732]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (8)
As of 2014-04-21 12:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (495 votes), past polls