Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: perl regex or module that identifies bots/crawlers

by shigetsu (Hermit)
on Mar 20, 2007 at 19:18 UTC ( #605732=note: print w/ replies, xml ) Need Help??

Comment on Re: perl regex or module that identifies bots/crawlers
Re^2: perl regex or module that identifies bots/crawlers
by argv (Pilgrim) on Mar 20, 2007 at 21:56 UTC
    Perhaps HTTP::BrowserDetect's robot() method?
    While I retain my enthusiasm for this module, and while it does precisely what I wanted it to do -- namely, to have a simplified/generic series of regex's that can determine whether a browser is a robot -- it suffers from a problem that plagues all who venture into this area: it's impossible to keep up with the robots. I've found numerous databases of known robot names, and all of them stipulate that none of these lists are complete. It is an unsolvable problem, which is the primary reason for the crypt glyphs you see on pages (that make you type something to prove you're a human). That said, the robot() method does a good enough job for now, and certainly well worth not having had to spend more time dealing with this problem. Great bang for the buck. perlmonks rescued me once again...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://605732]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2014-10-22 05:28 GMT
Find Nodes?
    Voting Booth?

    For retirement, I am banking on:

    Results (112 votes), past polls