Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: A little fun with merlyn

by boo_radley (Parson)
on Nov 12, 2001 at 12:59 UTC ( #124775=note: print w/ replies, xml ) Need Help??


in reply to A little fun with merlyn

Firstly, welcome back. I think. It's nice to see you posting, even if the jcwren-bot isn't hovering about in the CB.
If I remember correctly, this came up as a way to defeat votebots, and a lot of people liked it and cited examples -- a long dead news archiving website used a similar scheme -- but it's clear (especially now that you've implemented the idea) that simple fixed width fonts are too susceptible to breaking for use in this type of verification.

My suggestions were

  • to mix up fonts of varying proportions and styles -- this removes your ability to chop up an image into segments of the same dimension (9 x 17 in this case) for easier processing. This may also allow for characters to overlap (is this called kerning? I'm not down with fonts like that.) each others boundaries.
  • to introduce noise into the image, ruining the ability of a ocr engine to detect the outline of the characters. This might be foiled by applying some sort of smoothing algorythm over the image in cases of minimal noise, though, and in large amounts, the noise may overtake the signal.
  • providing contextual data about an image, like "how many blocks in this image are hollow?" or " how many stars are point up?" & similar challenges.
of course, any type of image recognition should take into effect potential user handicaps -- a blind person could never register his favorite ice cream, some one who's color blind may be foiled if the challenge relies on sorting things by color, and so on.

Off topic -- I think this makes you a terrorist in the U.S. now. Update : as for laziness, jcwren does note in his comments that he has ...

A small 'C' program then read the .BMP files, and built the # Perl code for the characters.
So, no foul there :-)


Comment on Re: A little fun with merlyn
Re: Re: A little fun with merlyn
by tstock (Curate) on Nov 13, 2001 at 07:09 UTC
      Altavista's technique may seem very complicated to break with OCR, but the solution is not to try with OCR.

      They aren't generating their "skewed letter" images on the fly (that would be hard to do for the same reasons it would be hard to parse) They have a finite set of images, and by finite i mean on the order of about 200. it would take about 15 minutes to write a script that downloads all of them, and about 45 minutes to do the data entry neccessary to map an image number with it's secret code.

      not that any of us would wnat to do that. :)

Re: Re: A little fun with merlyn
by mexnix (Pilgrim) on Nov 14, 2001 at 08:52 UTC
    Also, boo's second suggestion "introducing noise" is done when creating a Yahoo! Personals id. not that I have done that.... :)

    __________________________________________________

    s mmgfbs nf, nfyojy m,tr yb-zya-zy,s zfzphz,print;
    - thanks japhy :)

    mexnix.perlmonk.org

Re^2: A little fun with merlyn
by Aristotle (Chancellor) on Nov 16, 2001 at 02:35 UTC
    All these suggestions do not really make it anywhere near impossible to break though.

    My proposition is rather tricky and consists of two parts.. first you allocate a number of multiple palette entries to slight variations of the same color. Then you use these colors to form dithered colors, like red and green pixels forming a yellow shape, using different red palette indexes for each red pixel at random (and same for green - or whatever other color). If the background and foreground color share some dithering component (say, there's green pixels in both the background and the foreground), the contours of symbols are "washed out" a bit and the contrast between back- and foreground is low, you get a pretty much unsurmountable obstacle for OCR at least in its current form.

      Hmmm... I'm not sure but wouldn't that make things a little hard on colorblind people?

Pedant Point
by malloc (Pilgrim) on Nov 28, 2001 at 00:02 UTC
    Yes indeed, the process of moving letters into a single unit is called kerning (though usually this is done for stylistic reasons). The new pleasing-to-the-eye unit is called a ligature. Sorry, I had to throw in my two cents, I am in the midst of reading the TeXbook (pg. 4) ;)
    -malloc

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://124775]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2014-12-28 06:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (178 votes), past polls