Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

solve reCAPTCHA

by quentin (Initiate)
on Oct 16, 2013 at 14:07 UTC ( #1058451=perlquestion: print w/replies, xml ) Need Help??
quentin has asked for the wisdom of the Perl Monks concerning the following question:

Hey guys! I have got to solve a captcha on a website to advance to the next page. The captcha is from "reCAPTCHA tm". Has anyone got ideas for this? I know a lot of Perl but got no idea how to solve this one! By the way: I spoke to the sideadmin and he said its no problem! So nothing illegal! Greats Quentin

Replies are listed 'Best First'.
Re: solve reCAPTCHA
by moritz (Cardinal) on Oct 16, 2013 at 14:12 UTC
      He will not just "remove" the captcha bechause I ask him to! But your right actually he doesnt even need it then... But thats his decision! I am looking for a Perl related answer here!
Re: solve reCAPTCHA
by atcroft (Abbot) on Oct 16, 2013 at 14:20 UTC

    Quickest and easiest approach to come to mind: If you "spoke to the siteadmin and he said its no problem", then ask them if they can provide you an alternate method of accessing the data you seek-either values to enter that will be accepted in lieu of the reCAPTCHA response, or another address that will provide access to the data without the reCAPTCHA requirement. (If the latter, they may also wish to restrict the access by IP address, user agent, credentials you provide, or some combination thereof, to prevent malicious bypass.)

    Hope that helps.

      Well your right! But he is not willing to make any effort. All he said is: "okay". To the question if I can use a perl-bot. besides from that I finde the problem very interessting and was hoping someone could come up with an idea!

        The whole point to captcha is to thwart bots attempts to access a site. It may be that you have permission from the sysadmin. But that doesn't make it a simple problem to solve from your end of the line. If it were simple, it would be ineffective. This is one case where hard things really are hard. Any effort to thwart the captcha will consume considerable work on your part, elevating the cost above what you are willing to commit. Your sysadmin probably knows this and is secretly laughing behind your back.

        Sure! Go ahead! No problem! **smirk**


Re: solve reCAPTCHA
by marto (Archbishop) on Oct 16, 2013 at 14:27 UTC

    "I spoke to the sideadmin and he said its no problem!"

    Well, if you say so. Seriously is this supposed to sound like some sort of concent that we should accept is genuine?

    A sensible sysadmin would provide you with access via another method rather than give you the OK to automate capatcha completion. If they don't have a problem with you accessing this data or automating this system then they should provide you with a more sensible way of doing so.

      That may be true that "A sensible sysadmin" would do that. But he didnt. To be honest I dont see the point in questioning it! I especial wrote a mail to the admin because I thought people might get suspicious or something. This is the way I want to do it. As I said I find it interessting!

        "To be honest I dont see the point in questioning it!"

        Because you don't know how to get past the page with the reCAPTCHA ("no idea how to solve this one")? Because the effort required to get past it is greater than accessing data via a sensible mechanism? As interesting as you claim to find this you've offered none of your insights as to how to achieve this.

        If you know a lot of perl as you claim you could offer to work with the sysadmin to develop a sensible mechanism/API to allow yourself and others to get the data you need. For all I or anybody else knows you could just be looking to bypass a registration/submission page to spam some forum, since this method of challenge/response attempts to ensure that end user is human, and not a script.

        Interesting. I wrote an email to the admin too, and *she* has never heard of you!

Re: solve reCAPTCHA
by keszler (Priest) on Oct 16, 2013 at 14:35 UTC
    The standard method for getting around CAPTCHA pages is to either
    1. hire cheap labor
    2. create a very popular page of your own - usually pron
      1. copy the CAPTCHA image to your page
      2. get the input from someone who happens by your page
      3. send that input to the original page
    Otherwise, you'd need to create some seriously smart image processing. Perl would be a great choice for that; once you have some code and get stuck, post it here and we'll be happy to help.

      This strikes me as interesting, because on a NPR show, not too long ago, I heard the inventer of Captcha say that he was working on a system, whereby users could be novelly employed as book translators. The way it worked, is if a computer scan couldn't reproduce the book's page photo as readable text, those text fragments would be forwarded to the Captcha database, and would be sent out to end-users to see how many different eyes might translate it.

      I'm not really a human, but I play one on earth.
      Old Perl Programmer Haiku ................... flash japh

        That's probably referring to reCAPTCHA, or one of the newer CAPTCHA derivatives such as duolingo. The original CAPTCHA was just a means of increasing defense in the escalating arms race between bots and those who wanted to avoid them.


      "you'd need to create some seriously smart image processing" Thats what I thought but I hoped there would be a nother way. But well I guess Ill have a go at that As soon as I have got something Ill post it here!

        reCAPTCHA is intended to raise the probability that the entity on the other end of the line is a real person by making it very expensive (in developer costs, developer knowledge, and computational time) to create software that masquerades as a person. The only way for you to programatically get around reCAPTCHA is to reduce these costs by dropping the value of your developer time to near zero, do lots of research (get smart), and if you're hitting it often enough, to use powerful enough hardware that the computational cost is mitigated. In other words, you have to be rich, studious, and willing to work a long time for free. If you possess all of these qualities, I would think there are more worthwhile projects to donate your time and resources to.


Re: solve reCAPTCHA
by marinersk (Priest) on Oct 16, 2013 at 19:34 UTC
    LOL with this whole thread.

    Good luck with your reCAPTCHA project. I don't think the Monks are likely to help you much on this project, for the two main reasons already noted repeatedly above:

    1. It's more effort than it's worth to most people;
    2. Nothing you say will convince us you aren't just a hacker. You can insist your SysAdmin says it's okay, but that's what a hacker would have said so you earn no points there.

    After you get it working, you could publish it as Data::Hacking::CAPTCHA or something similar.


Re: solve reCAPTCHA
by james2vegas (Chaplain) on Oct 16, 2013 at 15:24 UTC
    The siteadmin should provide you a mechanism like an API key, or login so you can bypass captcha and your accesses can be tracked by key in case you abuse your access.
Re: solve reCAPTCHA
by Anonymous Monk on Jan 01, 2016 at 19:30 UTC
    "I know a lot of Perl but got no idea how to solve this one!"


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1058451]
Approved by ww
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2018-06-22 02:39 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (121 votes). Check out past polls.