Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

Compare Binary Files

by drodinthe559 (Monk)
on Sep 07, 2011 at 22:19 UTC ( #924696=perlquestion: print w/replies, xml ) Need Help??
drodinthe559 has asked for the wisdom of the Perl Monks concerning the following question:

What should I use to compare two binary files? It appear File::Compare will compare text files. I tried using Test::BinaryData but it didn't work for me. Your help would be appreciated. Thanks, David

Replies are listed 'Best First'.
Re: Compare Binary Files
by ikegami (Pope) on Sep 07, 2011 at 23:15 UTC

    If you're just trying to see if they are different:

    open(my $fh1, '<', $file1) or die $!; binmode($fh1); open(my $fh2, '<', $file2) or die $!; binmode($fh2); local $/ = \(64*1024); # Tweak as desired. for (;;) { my $blk1 = <$fh1>; my $blk2 = <$fh2>; last if !defined($blk1) && !defined($blk2); die("diff\n") if !defined($blk1) || !defined($blk2) || $blk1 ne $blk2; }

    Might want to save time by pre-checking the size of the files.

    It would be trivial to modify this code to output the bytes that differ.

    Detecting insertions and deletions is a whole other story, though.

Re: Compare Binary Files
by Khen1950fx (Canon) on Sep 07, 2011 at 23:11 UTC
    Diff::LibXDiff does a binary diff. It's also the diff engine used by git.
Re: Compare Binary Files
by morgon (Curate) on Sep 07, 2011 at 22:38 UTC
    If all you want is to check whether the files have identical content or not I would simply compare the MD5 (or SHA-1) hashes.

    Hash::MD5 will calculate the hash for you.

      Why? It seems to me that reading entire file to calculate its hashes is a waste of time compared to just reading the entire file.

      Calculating hashes is useful in two circumstances: 1) When comparing one document to many (or many to many), 2) When a compact signature of the file is needed (for ease of communication or storage).

Re: Compare Binary Files
by pemungkah (Priest) on Sep 08, 2011 at 03:59 UTC
    sub files_differ { system "cmp file1 file2"; return $? >> 8; }
    If you're not comparing hundreds of files, this will do fine. Remember "the simplest thing that could possibly work"? This is it. Plus cup gives you the option to skip a certain number of bytes before starting the compare and a lot of other options. (Tip of the hat to MJD, who talked about this in detail in Twelve Views of Mark-Jason Dominus.
      sub files_differ { system "cmp file1 file2"; return $? >> 8; }

      Windows has no "cmp" command. $? is set to 0x0100, files_differ always returns 1, even with missing files, even with identical files. Instant fail.

      H:\>perl -e "system 'cmp file1 file2';print $?,' ',$? >> 8" 'cmp' is not recognized as an internal or external command, operable program or batch file. 256 1 H:\>

      And by the way: Omitting quotes around the file name begs for trouble as soon as you replace the constants with variables. Using the multiple argument form of system would prevent that problem.


      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

        I have a cmp command on my windows system. There is also the comp command which has been available on all versions of windows:

        C:\test>cmp cmp: missing operand cmp: Try `cmp --help' for more information. C:\test>comp /? Compares the contents of two files or sets of files. COMP [data1] [data2] [/D] [/A] [/L] [/N=number] [/C] [/OFF[LINE]] data1 Specifies location and name(s) of first file(s) to compar +e. data2 Specifies location and name(s) of second files to compare +. /D Displays differences in decimal format. /A Displays differences in ASCII characters. /L Displays line numbers for differences. /N=number Compares only the first specified number of lines in each + file. /C Disregards case of ASCII letters when comparing files. /OFF[LINE] Do not skip files with offline attribute set. To compare sets of files, use wildcards in data1 and data2 parameters.

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Quite agree on the list form invocation; that's a much better choice. I always forget it, for some reason.

        Re no cmp on Windows: that just means this isn't the simplest possible thing for Windows. I'm sure there is an equivalent.

Re: Compare Binary Files
by zentara (Archbishop) on Sep 08, 2011 at 10:24 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://924696]
Approved by ikegami
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (4)
As of 2017-11-24 05:04 GMT
Find Nodes?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:

    Results (344 votes). Check out past polls.