http://www.perlmonks.org?node_id=565054

I've been researching methods to keep bots away from stuffing my registration form and the only thing I've seen that could be effective is CAPTCHA (Authen::Captcha). However with todays technology, even the best methods can be thwarted with a bot that has really good OCR capabilities. While we have some individuals who champion this there are far more others who look down on this method because the accessability of a site goes right out the window when a CAPTCHA is implemented. The blind can not read images. So how would we go about thwarting bots from our forms while still keeping things accessable so no one makes a big stink?

BMaximus

Replies are listed 'Best First'.
Re: If CAPTCHA isn't the answer. What is?
by Zaxo (Archbishop) on Aug 01, 2006 at 19:39 UTC

    Tell either a joke or an aimless story. Ask if it's funny.

    You may get in trouble with humorless people.

    After Compline,
    Zaxo

      I don't get it.

      ;)

Re: If CAPTCHA isn't the answer. What is?
by jhourcle (Prior) on Aug 01, 2006 at 19:45 UTC

    gellyfish explained some of the problems with using CAPTCHAs last month.

    Specifically, one of the links he sends you to discusses attacks against CAPTCHA by tricking humans into doing the work -- and if the goal is to let humans in, you're always going to be at least somewhat vulnerable to it. (we assume that they're using onion routing or something similar to keep you from limiting their submissions by IP).

    Captchas just make things more difficult to abuse, not impossible to abuse. There is no perfect solution, and you risk leaving out legitimate users.

Re: If CAPTCHA isn't the answer. What is?
by GrandFather (Saint) on Aug 01, 2006 at 19:26 UTC

    Ask the applicant a topical question. For example ask what the Featured artical is on Wikipedia (and provide a link to Wikipedia's front page) or ask what is the title of lead item at The Monastery Gates. Note that you could easily automate this by scraping the target page for the current item of interest.

    Bear in mind that any such thing that you do will eventually get circumvented, but it should give you a few days head start. :)


    DWIM is Perl's answer to Gödel
Re: If CAPTCHA isn't the answer. What is?
by eric256 (Parson) on Aug 01, 2006 at 20:24 UTC

    Stopping bots is a form of security. Like all forms of security it involves trade offs. You need to decide what trade offs are acceptable and work within those limits. I've implemented captcha's on sites for unregistered comments and it worked wonderfully. It probably helps that its a low load site so no one has focused on attacking it, that was a trade off I made.

    Maybe for your site requiring a response to an email, clicking a link in an email, sending password by email, whatever is better. CAPTCHA's are breakable but that stop tons and tons of abuse currently. Eventualy the hackers will get smarter, but i'm not going to worry overly about that until it happens.

    Normaly security should be layered to acheive the best result. So use some IP filtering, use some smart matches that look for obvious spam (links in the name field, whatever), use a captcha with an email bypass to recieve the respons by email, etc. In the end a determined person will just sit there and register all 20 accoutns if thats what they want, so focus on the general bots that just wander around looking for forms, and figure out ways to fool them more often than you fool the humans visiting your site.


    ___________
    Eric Hodges
Re: If CAPTCHA isn't the answer. What is?
by samtregar (Abbot) on Aug 01, 2006 at 22:00 UTC
    It won't help with the accessibility problem, but KittenAuth is probably proof against OCR for now:

    http://arstechnica.com/news.ars/post/20060407-6554.html

    I bet it'd be pretty hard to produce a bot that could pass that test, given a large enough DB of cudly animals. Of course an attacker can still trick people into doing the work in exchange for porn.

    -sam

      Given the small number of guesses required to guess correctly on average — 42 when choosing 3 from 9, 2184 when choosing 5 from 16 — KittenAuth is useless without a properly configured firewall. Without monitoring, it's an invation to hammer the server.
        It's been a while since I read the article, but I would assume that you are presented with a new set of pictures after an incorrect guess. With a large enough DB to avoid repeats I think your averages are way, way off.

        But yes, of course hammering is to be avoided. There are other tools for that, like my module CGI::Application::Plugin::RateLimiter for example.

        -sam

Re: If CAPTCHA isn't the answer. What is?
by radiantmatrix (Parson) on Aug 03, 2006 at 18:59 UTC

    The definition of "CAPTCHA" is not "text that's been put into an image and made hard to OCR". It's a problem that's hard for a computer, but easy for a human, to solve. There are a lot of things that meet the criteria. The real challenge is to creat a practical CAPTCHA, which meets all of the following:

    1. Impractical for average computing resources to solve ("hard enough")
    2. Simple for almost any human in your target audience to solve correctly
    3. Has solutions which can be easily and automatically checked for correctness

    The last two are the hard things. Leaving aside multi-lingual concerns, you still have cultural and experiencial differences. If, for example, you have an interrogation that deals with any kind of conceptual or qualitative measure, you're going to have people who can't pass because they disagree. It's a similar problem to writing good test questions.

    The final item isn't hard, but it is limiting. You need to keep your answers simple enough that they can be reliably tested for correctness -- commonly, this means looking up the answer in a database. Right now, the two classes of captcha that work reasonably well for this are the single-word response (like the usual "obfuscated text" captchas) and the multiple-choice test.

    The former can be defeated with OCR technology, and the latter can be trivially brute-forced.

    There are actually audio-based captchas out there, where if you have a blind user, they can click on an accessible link and recieve an audio clip of the word/letters to type in. Unfortunately, speech-recognition software is getting pretty good, too.

    Of course, there are other ways to solve the authentication problem. These have varying degrees of practicality depending on your applicaition. Some examples:

    1. Provide some sort of physical authenticator: for example, send a postal letter containing the initial password for the new account. This isn't practical for free or cheap services, or where your customers will leave if they have to wait a few days to get an account.
    2. Use phone verification. Paypal used to, for certain things, have an automated system call the phone number you provided and ask you for validation (touch-tone enter the last 4 digits of your bank account, for example). They warned you about it on the enrollment form, and they called within a couple of minutes. This is more expensive than the above, but much faster.
    3. Use another messaging technology. Send an SMS message with a verification code to the user's mobile phone: never send to the same phone twice in a given amount of time (one month?). Google does something like this (you must sign up to Gmail by referral or by using a mobile phone). It's concievable that you could use a common IM client as well, though that is much more open to bot abuse.
    4. Use referral: no one can join unless someone invites them. This can be problematic for sites that value disparate membership, or that are targetted to the public at large. Especially if you're trying to attract paying members, this is difficult.
    5. Use approval escalation. On message boards and the like, require that a new account's posts be approved until they've had d days and p posts without submitting any spam. Of course, this is either expensive or requires a pool of volunteer labor, which in turn requires a lot of management time. It also doesn't stop bots from consuming resources, since they still submit the spam and store it in your database (at least for a while).

    I guess the point of all this isn't "how do we make universally-acceptable captchas", but understanding how to solve the general "prove you're a person" problem. It's not trivial, and you'll have to either invest some time and/or money in the problem, or accept a failure rate.

    <radiant.matrix>
    A collection of thoughts and links from the minds of geeks
    The Code that can be seen is not the true Code
    I haven't found a problem yet that can't be solved by a well-placed trebuchet
Re: If CAPTCHA isn't the answer. What is?
by Anonymous Monk on Aug 01, 2006 at 21:00 UTC
    Require everyone who wishes to visit your web site to apply, in person, for a web key. Surgically embed a key-generating mini-computer just under their skin. Rig it to explode if tampered with.

    That should be reasonably tamper-proof. You may have a few user-acceptance issues to work out, however...

      One problem with biometric systems is that they only work while you still have that body part.

        If the device is rigged to explode when the body part is captured, then it neatly solves the entire problem. No end user == no end user complaints! ;-)