http://www.perlmonks.org?node_id=1058461


in reply to solve reCAPTCHA

The standard method for getting around CAPTCHA pages is to either
  1. hire cheap labor
  2. create a very popular page of your own - usually pron
    1. copy the CAPTCHA image to your page
    2. get the input from someone who happens by your page
    3. send that input to the original page
Otherwise, you'd need to create some seriously smart image processing. Perl would be a great choice for that; once you have some code and get stuck, post it here and we'll be happy to help.

Replies are listed 'Best First'.
Re^2: solve reCAPTCHA
by zentara (Archbishop) on Oct 16, 2013 at 16:15 UTC
    <OT>

    This strikes me as interesting, because on a NPR show, not too long ago, I heard the inventer of Captcha say that he was working on a system, whereby users could be novelly employed as book translators. The way it worked, is if a computer scan couldn't reproduce the book's page photo as readable text, those text fragments would be forwarded to the Captcha database, and would be sent out to end-users to see how many different eyes might translate it.


    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh

      That's probably referring to reCAPTCHA, or one of the newer CAPTCHA derivatives such as duolingo. The original CAPTCHA was just a means of increasing defense in the escalating arms race between bots and those who wanted to avoid them.


      Dave

Re^2: solve reCAPTCHA
by quentin (Initiate) on Oct 16, 2013 at 14:40 UTC
    "you'd need to create some seriously smart image processing" Thats what I thought but I hoped there would be a nother way. But well I guess Ill have a go at that then...lol As soon as I have got something Ill post it here!

      reCAPTCHA is intended to raise the probability that the entity on the other end of the line is a real person by making it very expensive (in developer costs, developer knowledge, and computational time) to create software that masquerades as a person. The only way for you to programatically get around reCAPTCHA is to reduce these costs by dropping the value of your developer time to near zero, do lots of research (get smart), and if you're hitting it often enough, to use powerful enough hardware that the computational cost is mitigated. In other words, you have to be rich, studious, and willing to work a long time for free. If you possess all of these qualities, I would think there are more worthwhile projects to donate your time and resources to.


      Dave