Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

A little fun with merlyn

by jcwren (Prior)
on Nov 12, 2001 at 07:52 UTC ( #124732=CUFP: print w/ replies, xml ) Need Help??

For Decembers column in WebTechniques, merlyn has written an anti-robot voting script, located here (source for the script is here, the actual column isn't up at WT yet). (BTW, I hear this is your last column, which is unfortunate. Sorry to hear that, Randal.) Update: The column text is located here. I hadn't seen the last paragraph of the article until well after the column was published.

Using some on-the-fly image generation with GD, the script requires that you type a 8 digit security code that the image contains into the text box, before you're allowed to vote on your favorite flavor of icecream (which as any self respecting individual knows is Strawberry).

Since I can't resist a good challenge, as long as it's pointless and a complete waste of time, I decided I needed a script to auto-vote for Strawberry (the respectable choice of icecream flavors).

Below is a script which does just that, using very primitive brute force OCR techniques. The code contains the comments for those that are interested, in particular regarding what Randal could have done to make it more difficult to do this.

I voted a bunch of times, and it always seems to work, unless you hit his server throttling, in which case you might get a 500 error. I thought about cron-jobbing it for a few days, but since he doesn't actually record votes, a little of the fun is removed.

The code follows the readmore tag.

--Chris

e-mail jcwren

#!/usr/local/bin/perl -w # # anti-antirobot - An automated icecream flavor voting system # # A while back, comrade merlyn wrote a little script that defeats aut +omatic # voters, as part of one of his columns. I decided it would be fun i +f it # it could automatically be voted, so I wrote this little widget. # # http://www.stonehenge.com/merlyn/WebTechniques/col68.listing.txt is + the # home of the original text for his article. # # It requires that 'convert' from the ImageMagick suite be installed. + This # is used to convert the .PNGs that merlyn returns to .BMP files, so +I can # take them apart to a bitmap. It's probably easy enough to do this +with # .PNG, but I'm not familiar with it. # # The theory is quite simple. You grab the page with the security im +age, # break down to a bit map, and brute force match known characters aga +inst # it. As you get a match, you add that character to the secret word +string. # And after the last character is matched, *poof*, you have the code. # # The OCR code is not very sophisticated. It counts on the fact that + the # characters in the security image are generated on the fly, and cont +ain no # noise, misregistration, etc. It only handle two-color images and d +oesn't # do anything smart like trimming blank lines prior to compare. # # Note that merlyn never claimed that the antirobot was foolproof. I +'m just # a better fool (for having spent the time to do this, when I could h +ave # been watching re-runs of BayWatch). There's a lot of tricks he cou +ld do, # such as changing colors (which could be compensated), variable font +s, edge # dithering, etc. These could be defeated, given time. Subtlely var +ying the # shades of the cells could be handled with a high-pass filter, so it +'s # either black or white. Fonts could be accumulated so you have a co +mplete # OCR map. There's a lot of things he could implement to make it mor +e # difficult, and given a little time, it could be defeated. # # Hopefully, if merlyn updates his script to defeat the antiantirobot +, he'll # leave the original in place as part of this demonstration, and call + the # new one 'auntierobot', or 'antirobot2' # # I re-used my OCR routines from stuff I wrote that took radar loops +from # www.weathertap.com and built multi-day national animated radar maps +. They # time stamp the images graphically, and I extracted that and used it + to # arrange the frames (a subsequent loop may contain frames the previo +us # loop did not contain, due to the refresh period of the radar). # use strict; use LWP::UserAgent; use HTML::LinkExtor; use HTML::Form; # # These are the bitmaps for the OCR, gleaned by running his antirobot + script # multiple times, and cutting the files up in PaintShopPro, and savin +g them # as .BMPs. A small 'C' program then read the .BMP files, and built +the # Perl code for the characters. # my @char_2 = ('.........', '.........', '.........', '.........', '...####..', '..##..##.', '.##....##', '.......##', '......##.', '.....##..', '....##...', '...##....', '..##.....', '.########', '.........', '.........', '.........'); my @char_3 = ('.........', '.........', '.........', '.........', '..#####..', '.##...##.', '.......##', '......##.', '....###..', '......##.', '.......##', '.......##', '.##...##.', '..#####..', '.........', '.........', '.........'); my @char_4 = ('.........', '.........', '.........', '.........', '......##.', '.....###.', '....####.', '...##.##.', '..##..##.', '.##...##.', '.########', '......##.', '......##.', '......##.', '.........', '.........', '.........'); my @char_5 = ('.........', '.........', '.........', '.........', '.#######.', '.##......', '.##......', '.##.###..', '.###..##.', '.......##', '.......##', '.##....##', '..##..##.', '...####..', '.........', '.........', '.........'); my @char_6 = ('.........', '.........', '.........', '.........', '...####..', '..##..##.', '.##....#.', '.##......', '.##.###..', '.###..##.', '.##....##', '.##....##', '..##..##.', '...####..', '.........', '.........', '.........'); my @char_7 = ('.........', '.........', '.........', '.........', '.########', '.......##', '.......##', '......##.', '.....##..', '....##...', '...##....', '..##.....', '.##......', '.##......', '.........', '.........', '.........'); my @char_8 = ('.........', '.........', '.........', '.........', '...####..', '..##..##.', '.##....##', '..##..##.', '...####..', '..##..##.', '.##....##', '.##....##', '..##..##.', '...####..', '.........', '.........', '.........'); my @char_9 = ('.........', '.........', '.........', '.........', '...####..', '..##..##.', '.##....##', '.##....##', '..##..###', '...###.##', '.......##', '..#....##', '..##..##.', '...####..', '.........', '.........', '.........'); my @char_a = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '...#####.', '..##...##', '.......##', '..#######', '.##....##', '.##...###', '..####.##', '.........', '.........', '.........'); my @char_b = ('.........', '.........', '.........', '.........', '.##......', '.##......', '.##......', '.##.###..', '.###..##.', '.##....##', '.##....##', '.##....##', '.###..##.', '.##.###..', '.........', '.........', '.........'); my @char_c = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '...#####.', '..##...##', '.##......', '.##......', '.##......', '..##...##', '...#####.', '.........', '.........', '.........'); my @char_d = ('.........', '.........', '.........', '.........', '.......##', '.......##', '.......##', '...###.##', '..##..###', '.##....##', '.##....##', '.##....##', '..##..###', '...###.##', '.........', '.........', '.........'); my @char_e = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '...####..', '..##..##.', '.##....##', '.########', '.##......', '..##...##', '...#####.', '.........', '.........', '.........'); my @char_f = ('.........', '.........', '.........', '.........', '....####.', '...##..##', '...##..##', '...##....', '...##....', '.######..', '...##....', '...##....', '...##....', '...##....', '.........', '.........', '.........'); my @char_g = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '..#####.#', '.##...###', '.##...##.', '.##...##.', '..#####..', '.##......', '..######.', '.##....##', '..######.', '.........'); my @char_h = ('.........', '.........', '.........', '.........', '.##......', '.##......', '.##......', '.##.###..', '.###..##.', '.##....##', '.##....##', '.##....##', '.##....##', '.##....##', '.........', '.........', '.........'); my @char_k = ('.........', '.........', '.........', '.........', '..##.....', '..##.....', '..##.....', '..##..##.', '..##.##..', '..####...', '..####...', '..##.##..', '..##..##.', '..##...##', '.........', '.........', '.........'); my @char_m = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '.#.##.##.', '.##.##.##', '.##.##.##', '.##.##.##', '.##.##.##', '.##.##.##', '.##.##.##', '.........', '.........', '.........'); my @char_n = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '.##.###..', '.###..##.', '.##....##', '.##....##', '.##....##', '.##....##', '.##....##', '.........', '.........', '.........'); my @char_p = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '.##.###..', '.###..##.', '.##....##', '.##....##', '.##....##', '.###..##.', '.##.###..', '.##......', '.##......', '.........'); my @char_q = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '...###.##', '..##..###', '.##....##', '.##....##', '.##....##', '..##..###', '...###.##', '.......##', '.......##', '.........'); my @char_r = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '.##.####.', '..###..##', '..##.....', '..##.....', '..##.....', '..##.....', '..##.....', '.........', '.........', '.........'); my @char_s = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '..######.', '.##....##', '.##......', '..######.', '.......##', '.##....##', '..######.', '.........', '.........', '.........'); my @char_t = ('.........', '.........', '.........', '.........', '.........', '...##....', '...##....', '.######..', '...##....', '...##....', '...##....', '...##....', '...##..##', '....####.', '.........', '.........', '.........'); my @char_u = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '.##....##', '.##....##', '.##....##', '.##....##', '.##....##', '..##..###', '...###.##', '.........', '.........', '.........'); my @char_v = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '.##....##', '.##....##', '..##..##.', '..##..##.', '...####..', '...####..', '....##...', '.........', '.........', '.........'); my @char_w = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '.##....##', '.##....##', '.##.##.##', '.##.##.##', '.##.##.##', '.########', '..##..##.', '.........', '.........', '.........'); my @char_x = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '.##....##', '..##..##.', '...####..', '....##...', '...####..', '..##..##.', '.##....##', '.........', '.........', '.........'); my @char_y = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '.##....##', '.##....##', '.##....##', '.##....##', '.##....##', '..##..###', '...###.##', '.#.....##', '..######.', '.........'); my @char_z = ('.........', '.........', '.........', '.........', '.........', '.........', '.........', '..######.', '......##.', '.....##..', '....##...', '...##....', '..##.....', '..######.', '.........', '.........', '.........'); my @char_A = ('.........', '.........', '.........', '.........', '....##...', '...####..', '..##..##.', '.##....##', '.##....##', '.##....##', '.########', '.##....##', '.##....##', '.##....##', '.........', '.........', '.........'); my @char_B = ('.........', '.........', '.........', '.........', '.######..', '.##...##.', '.##....##', '.##...##.', '.######..', '.##...##.', '.##....##', '.##....##', '.##...##.', '.######..', '.........', '.........', '.........'); my @char_C = ('.........', '.........', '.........', '.........', '...#####.', '..##...##', '.##.....#', '.##......', '.##......', '.##......', '.##......', '.##.....#', '..##...##', '...#####.', '.........', '.........', '.........'); my @char_D = ('.........', '.........', '.........', '.........', '.######..', '.##...##.', '.##....##', '.##....##', '.##....##', '.##....##', '.##....##', '.##....##', '.##...##.', '.######..', '.........', '.........', '.........'); my @char_E = ('.........', '.........', '.........', '.........', '.#######.', '.##......', '.##......', '.##......', '.######..', '.##......', '.##......', '.##......', '.##......', '.#######.', '.........', '.........', '.........'); my @char_F = ('.........', '.........', '.........', '.........', '.########', '.##......', '.##......', '.##......', '.######..', '.##......', '.##......', '.##......', '.##......', '.##......', '.........', '.........', '.........'); my @char_G = ('.........', '.........', '.........', '.........', '...#####.', '..##...##', '.##......', '.##......', '.##......', '.##...###', '.##....##', '.##....##', '..##...##', '...#####.', '.........', '.........', '.........'); my @char_H = ('.........', '.........', '.........', '.........', '.##....##', '.##....##', '.##....##', '.##....##', '.########', '.##....##', '.##....##', '.##....##', '.##....##', '.##....##', '.........', '.........', '.........'); my @char_K = ('.........', '.........', '.........', '.........', '.##....##', '.##...##.', '.##..##..', '.##.##...', '.####....', '.####....', '.##.##...', '.##..##..', '.##...##.', '.##....##', '.........', '.........', '.........'); my @char_M = ('.........', '.........', '.........', '.........', '.##....##', '.###..###', '.########', '.##.##.##', '.##.##.##', '.##.##.##', '.##....##', '.##....##', '.##....##', '.##....##', '.........', '.........', '.........'); my @char_N = ('.........', '.........', '.........', '.........', '.##....##', '.###...##', '.####..##', '.####..##', '.##.##.##', '.##.##.##', '.##..####', '.##...###', '.##...###', '.##....##', '.........', '.........', '.........'); my @char_P = ('.........', '.........', '.........', '.........', '.#######.', '.##....##', '.##....##', '.##....##', '.#######.', '.##......', '.##......', '.##......', '.##......', '.##......', '.........', '.........', '.........'); my @char_Q = ('.........', '.........', '.........', '.........', '...####..', '..##..##.', '.##....##', '.##....##', '.##....##', '.##....##', '.##.##.##', '.##..####', '..##..##.', '...####.#', '.........', '.........', '.........'); my @char_R = ('.........', '.........', '.........', '.........', '.#######.', '.##....##', '.##....##', '.##....##', '.#######.', '.#####...', '.##..##..', '.##...##.', '.##....##', '.##....##', '.........', '.........', '.........'); my @char_S = ('.........', '.........', '.........', '.........', '..######.', '.##....##', '.##......', '.##......', '..######.', '.......##', '.......##', '.......##', '.##....##', '..######.', '.........', '.........', '.........'); my @char_T = ('.........', '.........', '.........', '.........', '.########', '....##...', '....##...', '....##...', '....##...', '....##...', '....##...', '....##...', '....##...', '....##...', '.........', '.........', '.........'); my @char_U = ('.........', '.........', '.........', '.........', '.##....##', '.##....##', '.##....##', '.##....##', '.##....##', '.##....##', '.##....##', '.##....##', '..##..##.', '...####..', '.........', '.........', '.........'); my @char_V = ('.........', '.........', '.........', '.........', '.##....##', '.##....##', '.##....##', '..##..##.', '..##..##.', '..##..##.', '...####..', '...####..', '....##...', '....##...', '.........', '.........', '.........'); my @char_W = ('.........', '.........', '.........', '.........', '.##....##', '.##....##', '.##....##', '.##....##', '.##.##.##', '.##.##.##', '.##.##.##', '.########', '.###..###', '.##....##', '.........', '.........', '.........'); my @char_X = ('.........', '.........', '.........', '.........', '.##....##', '.##....##', '..##..##.', '...####..', '....##...', '....##...', '...####..', '..##..##.', '.##....##', '.##....##', '.........', '.........', '.........'); my @char_Y = ('.........', '.........', '.........', '.........', '.##....##', '.##....##', '..##..##.', '...####..', '....##...', '....##...', '....##...', '....##...', '....##...', '....##...', '.........', '.........', '.........'); my @char_Z = ('.........', '.........', '.........', '.........', '.#######.', '......##.', '......##.', '.....##..', '....##...', '...##....', '..##.....', '.##......', '.##......', '.#######.', '.........', '.........', '.........'); my %charlist = ( '2' => \@char_2, '3' => \@char_3, '4' => \@char_4, '5' => \@char_5, '6' => \@char_6, '7' => \@char_7, '8' => \@char_8, '9' => \@char_9, 'a' => \@char_a, 'b' => \@char_b, 'c' => \@char_c, 'd' => \@char_d, 'e' => \@char_e, 'f' => \@char_f, 'g' => \@char_g, 'h' => \@char_h, 'k' => \@char_k, 'm' => \@char_m, 'n' => \@char_n, 'p' => \@char_p, 'q' => \@char_q, 'r' => \@char_r, 's' => \@char_s, 't' => \@char_t, 'u' => \@char_u, 'v' => \@char_v, 'w' => \@char_w, 'x' => \@char_x, 'y' => \@char_y, 'z' => \@char_z, 'A' => \@char_A, 'B' => \@char_B, 'C' => \@char_C, 'D' => \@char_D, 'E' => \@char_E, 'F' => \@char_F, 'G' => \@char_G, 'H' => \@char_H, 'K' => \@char_K, 'M' => \@char_M, 'N' => \@char_N, 'P' => \@char_P, 'Q' => \@char_Q, 'R' => \@char_R, 'S' => \@char_S, 'T' => \@char_T, 'U' => \@char_U, 'V' => \@char_V, 'W' => \@char_W, 'X' => \@char_X, 'Y' => \@char_Y, 'Z' => \@char_Z, ); # # The OCR routine # sub ocr_it { my $data = shift; my @image = (); my $passphrase = ""; my ($hdr, $filesize, $rsvrd, $bdo, $hdrsize, $width, $height, $plan +es, $bpp, $compression, $datasize, $hres, $vres, $colors, $icolors) = unpack ("a2IIIIIISSIIIIII", $data); $data = substr ($data, 54 + ($icolors * 4)); for (my $i = $height - 1; $i >= 0; $i--) { my $s = unpack ("B$width", substr ($data, $i * (int (($width + 3 +1) / 32) * 4))); $s =~ s/0/./g; $s =~ s/1/#/g; push @image, $s; } print join ("\n", @image), "\n"; # # For the width of the bitmap # for (my $column = 0; $column < $width; $column++) { # # For each character we can match against # foreach my $char (keys %charlist) { # # For the number of rows in this character # my $match = 1; my $charwidth = length (@{$charlist {$char}}[0]); for (my $row = 0; $row < scalar @{$charlist {$char}}; $row++) { # # For the number of columns in this character # if (substr ($image [$row], $column, $charwidth) ne @{$char +list {$char}}[$row]) { $match = 0; last; } } if ($match) { $passphrase .= $char; $column += ($charwidth - 1); last; } } } return $passphrase; } # # This is "main" # { my $baseurl = "http://www.stonehenge.com"; my $url = "$baseurl/cgi/antirobot"; my $tempname = "/tmp/$$." . time . ".tmp"; my @images; # # Get the base page with the choices and the security code # my $ua = LWP::UserAgent->new; $ua->agent ("Strawberry/1.0 (chocolate sucks; vanilla is dull; anti +-antirobot 1.0)"); my $req = new HTTP::Request ('GET' => $url, HTTP::Headers->new ('Co +ntent-Type' => 'application/x-www-form-urlencoded')); my $res = $ua->request ($req); $res->is_error && die "Can't get page"; my $form = HTML::Form->parse ($res->content (), $baseurl); # # Extract links. There can be only one! (And it should be the li +nk # to the .PNG image that contains the security code.) In retrospe +ct, # HTML::SimpleLinkExtor may have made more sense. # my $p = HTML::LinkExtor->new (\&cb, $baseurl); sub cb { my ($tag, %links) = @_; push @images, $links{src} if ($tag =~ m/img/i); } $p->parse ($res->content ()); die "Not just 1 image!" if (scalar @images != 1); # # Get the security image # $req = new HTTP::Request ('GET' => $images [0], HTTP::Headers->new +('Content-Type' => 'application/x-www-form-urlencoded')); $res = $ua->request ($req); $res->is_error && die "Can't get security image"; # # Now the fun part. Save to a temporary file, convert to a .BMP # file (so we can get the bits easily, I don't know how .PNG works +), # then OCR it. This gives us the secret number. # open (FH, ">$tempname") || die $!; print FH $res->content (); close FH; my $bmpimg = `convert $tempname bmp:-`; my $secretword = ocr_it ($bmpimg); unlink $tempname; # # Tell the user what it is, then vote for Strawberry (my favorite) # print "\nThe secret word is: $secretword\n\n"; $form->value ("flavor", "Strawberry"); $form->value ("verify", $secretword); $res = $ua->request ($form->click); $res->is_error && die "Can't post vote"; # # Finally, print the result of what the antibot sent back. If it' +s # good, we should see some text about thanking us. # if ($res->content () =~ m/thank/i) { print "Looks like he liked us. Strawberry it is!\n\n"; } else { print "Uh oh, we didn't pass a good security code\n\n"; } }

Comment on A little fun with merlyn
Download Code
recurring perl bug: capital Z's.
by Vynce (Friar) on Nov 12, 2001 at 08:05 UTC

    hm. from jcwren's code:

    'X' => \@char_X, 'Y' => \@char_Y, 'z' => \@char_z,

    i'm sure you mean capital Z's there. and really, couldn't this whole chain of stuff be done automaticaly? laziness, man. you've typed basically the same line 50 times.

    the reason i call this a recurring bug is that if you own the magnetic perl poetry kit (sadly apparently no longer available), as i do, you will notice upon close inspection that it comes with each of the 25 lowercase letters a through y and a capital Z. apparently perlers are bad with z-capitalization.

    .

      and really, couldn't this whole chain of stuff be done automaticaly? laziness, man. you've typed basically the same line 50 times.

      How do you know he didn't do it automatically? :) At any rate, yes you could have it generated programatically. If they weren't lexically declared and were stuck in their own package you could just slurp from the symbol table. But something like this would work in this case where they are lexicals.

      { no strict 'refs'; my @chrs = ( 'a'..'h', 'k', 'm', 'n', 'p'..'z' ); push @chrs, map uc, @chrs; push @chrs, 2..9; $charlist{ $_ } = \@{"char_$_"} foreach @chrs; }
Re: A little fun with merlyn
by Starky (Chaplain) on Nov 12, 2001 at 11:43 UTC
    Curiously enough, not only does this circumvent merlyn's anti-bot scheme, but (based on the last article I read about PayPal) it could potentially be used to circumvent an important PayPal fraud-prevention scheme that uses a GIF with numbers and letters that a user must type in during a transaction.

    The scheme was considered a brilliant accomplishment by industry observers, and widely attributed to the almost instant cessation of certain kinds of fraud on PayPal.

    Apparently the bad guys haven't heard of Perl. Of course, a real monk would only use Perl for goodness and the greater benefit of mankind.

    P.S. Sorry I don't have a reference to the article mentioned above. It was in some Newsweekesque magazine I was browsing in the gym ....

      it could potentially be used to circumvent an important PayPal fraud-prevention scheme that uses a GIF with numbers and letters

      I must say that I find that highly unlikely. Take a look at Paypal registration to see an example of the images generated - even though I am a firm beleiver that anything can be written in perl, eventaully, this really funny little trick isn't coming close to breaking the paypal images. Nor was it intended to, of course. :)

      Reading a character in an image is old news as such, so paypal uses a lot of different "blurring" techniques, such as drawing lines at random intervals, moving the characters inside the image and using a font that is hard to interpret - and probably other things too.

      Just so any paypal users can sleep a little tonight...


      You have moved into a dark place.
      It is pitch black. You are likely to be eaten by a grue.
        Interesting. I wonder if it might be possible to use something like a hopfield network to defeat such schemes. Ages ago, I wrote a simple implementation, and it's available here, but that code is probably both ugly and not as algorithmically good as it could be.
Re: A little fun with merlyn
by boo_radley (Parson) on Nov 12, 2001 at 12:59 UTC
    Firstly, welcome back. I think. It's nice to see you posting, even if the jcwren-bot isn't hovering about in the CB.
    If I remember correctly, this came up as a way to defeat votebots, and a lot of people liked it and cited examples -- a long dead news archiving website used a similar scheme -- but it's clear (especially now that you've implemented the idea) that simple fixed width fonts are too susceptible to breaking for use in this type of verification.

    My suggestions were

    • to mix up fonts of varying proportions and styles -- this removes your ability to chop up an image into segments of the same dimension (9 x 17 in this case) for easier processing. This may also allow for characters to overlap (is this called kerning? I'm not down with fonts like that.) each others boundaries.
    • to introduce noise into the image, ruining the ability of a ocr engine to detect the outline of the characters. This might be foiled by applying some sort of smoothing algorythm over the image in cases of minimal noise, though, and in large amounts, the noise may overtake the signal.
    • providing contextual data about an image, like "how many blocks in this image are hollow?" or " how many stars are point up?" & similar challenges.
    of course, any type of image recognition should take into effect potential user handicaps -- a blind person could never register his favorite ice cream, some one who's color blind may be foiled if the challenge relies on sorting things by color, and so on.

    Off topic -- I think this makes you a terrorist in the U.S. now. Update : as for laziness, jcwren does note in his comments that he has ...

    A small 'C' program then read the .BMP files, and built the # Perl code for the characters.
    So, no foul there :-)
        Altavista's technique may seem very complicated to break with OCR, but the solution is not to try with OCR.

        They aren't generating their "skewed letter" images on the fly (that would be hard to do for the same reasons it would be hard to parse) They have a finite set of images, and by finite i mean on the order of about 200. it would take about 15 minutes to write a script that downloads all of them, and about 45 minutes to do the data entry neccessary to map an image number with it's secret code.

        not that any of us would wnat to do that. :)

      Also, boo's second suggestion "introducing noise" is done when creating a Yahoo! Personals id. not that I have done that.... :)

      __________________________________________________

      s mmgfbs nf, nfyojy m,tr yb-zya-zy,s zfzphz,print;
      - thanks japhy :)

      mexnix.perlmonk.org

      All these suggestions do not really make it anywhere near impossible to break though.

      My proposition is rather tricky and consists of two parts.. first you allocate a number of multiple palette entries to slight variations of the same color. Then you use these colors to form dithered colors, like red and green pixels forming a yellow shape, using different red palette indexes for each red pixel at random (and same for green - or whatever other color). If the background and foreground color share some dithering component (say, there's green pixels in both the background and the foreground), the contours of symbols are "washed out" a bit and the contrast between back- and foreground is low, you get a pretty much unsurmountable obstacle for OCR at least in its current form.

        Hmmm... I'm not sure but wouldn't that make things a little hard on colorblind people?

      Yes indeed, the process of moving letters into a single unit is called kerning (though usually this is done for stylistic reasons). The new pleasing-to-the-eye unit is called a ligature. Sorry, I had to throw in my two cents, I am in the midst of reading the TeXbook (pg. 4) ;)
      -malloc
Re: A little fun with merlyn
by mortis (Pilgrim) on Nov 20, 2001 at 20:38 UTC
    Wouldn't simply drawing a line (not necessarily straight) through each of the letters/digits (in the exact same color) to connect them defeat most ORC algorithms?

    Or how about just overlapping the characters just a little bit with an adjacent character.

    I'd think you can still do edge detection even if you're mixing up pixels just to simulate dithering.

    just a couple of thoughts...

Re: A little fun with merlyn
by hacker (Priest) on Jun 03, 2002 at 11:54 UTC
    I have been playing with merlyn's original code, and caught this (thanks to those helpful monks on CB). I have some ideas around this, which may do one better than PayPal and Yahoo's schemes.
    • What if you had an image, an animated .gif file of two layers, where the image "shifted" every 2 seconds by one pixel to the left/right/diagonal?

    • What about using a transparent .png file, where the pixels that make up the "visible" letters/numbers on the image are actually spread randomly across several layers in the image? Basically like painting on several layers of glass, then stacking them up and looking through them.

    • How about an image where the entire image was transparent, but the bgcolor of the page, rendered with CSS changed to make the image lettering "visible" in that <div> tag when the user hovered over that section?

    Just a few ideas, but then again.. anyone who would try to stuff the ballots on a poll system (versus a login/authentication system) would probably not go to this length... unless that person were jcwren of course.

      An animated gif doesn't help, because the animation is really just a series of still frames, and breaking the animation into the individual frames and analyzing just one of them is trivial. Transparency doesn't help either, because to the program doing the OCR, 'transparent' is as much a color as blue or green, it's simply another index into the colormap.

        Not that anyone will read this after all this time... I see a way in which transparancy could really fark up an OCR. Using cascading style sheets, layer multiple transparent images directly on top of one another, so that the signal is broken across two or more files. If that's not enough, then you can add plain HTML text as well behind it.

        So you have to find multiple images, which can be placed anywhere (dynamically) in the body segment, and also add in some text that you have to parse out with a tokenizer. :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://124732]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (13)
As of 2014-07-23 10:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (140 votes), past polls