Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: perl regex or module that identifies bots/crawlers

by shigetsu (Hermit)
on Mar 20, 2007 at 19:18 UTC ( #605732=note: print w/replies, xml ) Need Help??

in reply to perl regex or module that identifies bots/crawlers

Perhaps HTTP::BrowserDetect's robot() method?
  • Comment on Re: perl regex or module that identifies bots/crawlers

Replies are listed 'Best First'.
Re^2: perl regex or module that identifies bots/crawlers
by argv (Pilgrim) on Mar 20, 2007 at 21:56 UTC
    Perhaps HTTP::BrowserDetect's robot() method?
    While I retain my enthusiasm for this module, and while it does precisely what I wanted it to do -- namely, to have a simplified/generic series of regex's that can determine whether a browser is a robot -- it suffers from a problem that plagues all who venture into this area: it's impossible to keep up with the robots. I've found numerous databases of known robot names, and all of them stipulate that none of these lists are complete. It is an unsolvable problem, which is the primary reason for the crypt glyphs you see on pages (that make you type something to prove you're a human). That said, the robot() method does a good enough job for now, and certainly well worth not having had to spend more time dealing with this problem. Great bang for the buck. perlmonks rescued me once again...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://605732]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (9)
As of 2018-06-21 08:41 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (117 votes). Check out past polls.