Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Matching Binary Files

by Itatsumaki (Friar)
on Jan 10, 2004 at 18:55 UTC ( #320353=perlquestion: print w/replies, xml ) Need Help??

Itatsumaki has asked for the wisdom of the Perl Monks concerning the following question:

Howdy all,

There is an open-source program that creates an image sa the output files (this comes from my earlier question about drawing chromosomes. I suggested to the primary developer that perhaps it would be useful to add testing to see if "back-end" changes ended up impacting the final product. He said "go for it". So I'm now, err, trying to do that!

So two basic questions:

  1. Is there a general way to compare two binary files and see if they are identical?
  2. Even better, is there a way to calculate the "distance" between two images of the same size?

So far, this is what I came up with:

  1. encode both files (with something like ROT), write the encodings to a file, and do a diff on the encoded files
  2. walk over each image, pixel by pixel and calculate the Euclidean distance between corresponding pixels on the two images

In case #2 isn't clear, I would use something like this:

# two image objects my $image1 = new <some image object>; my $image2 = new <some image object>; # ensure images are same size if ($image1->size != $image2->size) { die "Images different sizes\n"; } # locals my $position = 0; my $cumulative_distance = 0; my %bad_pixels; my $threshold = 100; # loop over all pixels while ($position < $image1->size) { # get current pixels my $pixel1 = $image1->getpixel($position); my $pixel2 = $image2->getpixel($position); my $distance = 0; # calculate distance for each colour $distance += ($pixel1->red - $pixel2->red )^2; $distance += ($pixel1->yellow - $pixel2->yellow)^2; $distance += ($pixel1->blue - $pixel2->blue )^2; $distance = $distance^0.5; # if distance very large, add to bad pixel-list if ($distance > $threshold) { $bad_pixels{$position} = $distance; } $cumulative_distance += $distance; $position++; } my $average_distance = $cumulative_distance / $position; my $bad_pixels = scalar(keys(%bad_pixels)); print "Total Distance: $cumulative_distance\n", "Avg Distance: $average_distance\n", "Deviant Pixels: $bad_pixels\n";

Of course that all requires some image library that lets me walk through pixel-by-pixel and extract the colour values.

Any other (easier?) approaches I could go for? Also, any comments/criticisms of what I've come up with are always appreciated.


Replies are listed 'Best First'.
Re: Matching Binary Files
by Zero_Flop (Pilgrim) on Jan 10, 2004 at 22:29 UTC
    One way to do this is with Fourier Transform. By comparing the magnitude and phase of the transform you can determine if one image is the same as another only under some type of transform such as scale rotation, or translation.

    Here are some sites that may be useful:

    Google for:
    Fourier-Based Image Registration Techniques &
    Fourier-Mellin Transforms for Image Registration

    I worked a little on this with Matlab, and I think there is a perl -FFT module.
    It is popular in research because it has possibilities with image watermarking as well as image
    recognition for robots and the like.

    I have collected a volume of examples when I was working on it, If you think you are interested let me know
    and I can try to pull that stuff out and send it. It's far too much to post.

    Another thing you could to is simply take one image and subtract the other. This would probably be the easiest. I know Jimage can do this and I am sure there are alot of others. (or write one in perl ;) )  basically subtract pixel for pixel and take the absolutes value of the result. The resultant image should be white were the images match and colored were the images do not. Depending on how accurate you want to be, if you write it in perl, then you can count the non zero points.

    If you come up with anything I would love to see it.

Re: Matching Binary Files
by BrowserUk (Pope) on Jan 10, 2004 at 20:24 UTC

    The answer to question 1 is easy. Load both files as scalars and eq will tell you if the are identical.

    #! perl -slw use strict; die "usage: $0 binfile1 binfile2" unless @ARGV == 2; open my $f1, '< :raw', $ARGV[ 0 ] or die "Couldn't open $ARGV[ 0 ]: $! +"; open my $f2, '< :raw', $ARGV[ 1 ] or die "Couldn't open $ARGV[ 1 ]: $! +"; my( $d1, $d2 ); sysread( $f1, $d1, -s $ARGV[ 0 ] ) or die "Couldn't read $ARGV[ 0 ]"; sysread( $f2, $d2, -s $ARGV[ 1 ] ) or die "Couldn't read $ARGV[ 1 ]"; close( $f1 ) and close( $f2 ); print "$ARGV[ 0 ] and $ARGV[ 1 ] are ", $d1 eq $d2 ? 'the same' : 'dif +ferent'; __END__ P:\test>320353 fox1.jpg fox1.jpg fox1.jpg and fox1.jpg are the same P:\test>320353 fox1.jpg fox2.jpg fox1.jpg and fox2.jpg are different

    The answer to question 2 is either relatively trivial, just requiring large amounts of processor power, or much, much harder, depending upon whether the registration between the two images are accurate.

    If the two images are accurately aligned, then you could load the images using GD and that will allow you to perform your distance algorithm quite easily (if rather slowly).

    If the two images are even 1-pixel out of alignment, and the problem has become 9x harder (and slower). If you are going to allow for the images being 2-pixels out of alignment and it gets 25x harder, 3-pixels and 49x harder, and so on.

    If you intend to do this in perl, then you would probably be better off converting the jpgs to a raw file format, no headers, compression etc. just 3 (or 4 ) bytes per pixel in a contiguous stream and the loading them up and using something like pdl which will allow you to perform the math in C.

    Have fun:

    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Timing (and a little luck) are everything!

Re: Matching Binary Files
by neuroball (Pilgrim) on Jan 10, 2004 at 21:12 UTC

    To weed out images that aren't exact copies you might want to use Digest::MD5 with special attention to $md5->addfile($io_handle).

    It would allow you to get MD5 checksums of your images in hex that you could take and compare. If the checksums are of you could start to search for shifted pixels.

    As to the image library you might want to take a look at ImageMagick and the perl module PerlMagick, which allows the use of the library from within perl.

    I hope this helps in scratching your programmers itch.


Re: Matching Binary Files
by zentara (Archbishop) on Jan 11, 2004 at 15:44 UTC
    perldoc Imager::Filters has a section on image difference:
    You can create a new image that is the difference between 2 other images. my $diff = $img->difference(other=>$other_img); For each pixel in $img that is different to the pixel in $othe +r_img, the pixel from $other_img is given, otherwise the pixel is transparent black.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://320353]
Approved by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2020-06-02 18:28 GMT
Find Nodes?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?

    Results (19 votes). Check out past polls.