http://www.perlmonks.org?node_id=350785


in reply to Another way to get around automated bots

And again, this is a technical solution that has legal ramifications preventing its use for all but tiny toy sites.

I also don't see the advantage to just using a simple image. For one, you just punished a dialup user pretty badly.

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

  • Comment on •Re: Another way to get around automated bots

Replies are listed 'Best First'.
Re: •Re: Another way to get around automated bots
by davido (Cardinal) on May 05, 2004 at 17:32 UTC
    I was amazed the other day when I set up a PayPal account. I found that in order to set up the account I had to repeat back the numbers that I read from a slightly obscured graphic image.

    Paypal is hardly a "toy" site. My immediate thought was, "Isn't this what merlyn's always talking about?" Unbelievable that they would use such a limiting method to authenticate users.


    Dave

Re: •Re: Another way to get around automated bots
by AssFace (Pilgrim) on May 05, 2004 at 14:37 UTC

    Yes, dealing with the various disabled users is tough because in order for them to make use of it, they need the computer to see it (and then be spoken in the case of someone that is blind - or sight limited or whatever the current PC terms are), and if the computer can see it - then any bot can see it.
    I guess I should consider myself fortunate that I don't ever have to program for that - every application I have written has been for an environment where sight is assumed.

    The advantage over using an image is that it can't be scanned by a bot and read - if you have an image on a page, the bot can look for the image and then pull the data from the image (easiest way is using neural net training - well, I guess not "easiest" but most effective for varied image types).

    But if there is no image there, then that particular bot can't find anything. The text is also not on the page, so it can't find anything either. The bot then has to parse the appropriate content on the page (which is easy if it is the only thing on the page, harder as you add more content and dynamically change how you reference the classes) and rebuild it as an image, and then do the analysis on it.
    There are ways of making it much harder for the bot to rebuild it.

    Yeah, it is about 10K to represent the same as what a 1K PNG could have done - certainly not ideal for showing images - but this wouldn't be something that you would do on every page either. That is about a 2 to 3 second download for a 33kbps modem user.



    -------------------------------------------------------------------
    There are some odd things afoot now, in the Villa Straylight.
      I guess I should consider myself fortunate that I don't ever have to program for that - every application I have written has been for an environment where sight is assumed.

      I'd be careful about those assumptions (disclaimer: people pay me for accessibility work :-)

      For government or government funded sites in the UK, US and in other countries accessibility is a major issue - contractually or legally depending on locale. For business sites it's becoming a potential legal/PR minefield.

      The advantage over using an image is that it can't be scanned by a bot and read.

      Yes it can. Automating a web browser and a screen grab program isn't hard. With a little more effort they can just parse and interpret the HTML directly.

      The question is - is it worth the effort for somebody to do this on your site.

      The WAI have a nice working paper on the topic Inaccessibility of Visually-Oriented Anti-Robot Tests for those who are interested in the topic.

      Personally I have found heuristic server-side solutions much more effective. For example:

      • Require an response from the user via email
      • Keep an eye out for registrations coming from the same IP/domain
      • Keep an eye out for registrations with similar data
      • Feedback forms with "random" names and a tracking ID to make them do a lot more work to automate the submission.
      • ... I'm sure you get the idea...

      Depending on your application it may be worth thinking how much a captured registration is worth in the currency of your choice, and then thinking about how many registrations a minimum wage worker could make on your site in an hour. If the math comes out the wrong way you're going to have to rethink anyway.