Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Bitwise comparision of files

by kelly (Initiate)
on Feb 28, 2012 at 17:39 UTC ( #956715=perlquestion: print w/ replies, xml ) Need Help??
kelly has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, i am new to Perl language, and was wondering how can one bitwise c ompare two files(contents of the files).
.EDIT: Test Case /dir1 /dir2 -- file1 -- file1 -- file2 -- file2 -- file3 -- .... -- ... ---/subDir1 --file1 --file2 file1 of dir1 contains :- foo bar file1 of dir2 contains :- foo Result - Fail file1 of dir1 contains :- foo bar file1 of dir2 contains :- foo bar Result - Pass.
The script should essentially extract files with same names present in different directories and compare the entries in them.I hope , i am clear with the problem statement, Thanks Kelly

Comment on Bitwise comparision of files
Download Code
Re: Bitwise comparision of files
by JavaFan (Canon) on Feb 28, 2012 at 17:43 UTC
    Can you describe what you mean by "bitwise" compare? If all you want to know is whether two files are identical or not, just use the diff utility.
      Well , the script should read all the files from the first directory and all its subdirectories and compare to its corresponding files in second directory.The two directory name are command line arguments Result: FAIL - If atleast one file is not bitwise equal to the corresponding file in the second directory or the second directory has no file. Otherwise it is passed.
        If all you want to know is if there are any differences between the two directories, it might make sense to hash the tarballs of the directories and just see if the hashes match up.
        kelly,
        Based on this explanation, you are going to want to take a look at File::Find. There are other modules on CPAN that try to make up for the horrendous call-back interface but the point is - don't try and walk the directory structure yourself - you will make a mistake.

        You are also going to want to take a look at File::Spec. There are a bunch of functions that will help you say things like chop off this portion of the path and replace it with this other path in order to determine if the file even exists in the other directory prior to comparing the actual file contents.

        You probably also want to take a look at Digest::MD5. It isn't a great choice for cryptographical reasons but in order to determine if two files are or are not the same, it has a really simple interface and should work just fine.

        If you want more help than that, you are going to have to show some more effort first.

        Cheers - L~R

        The problem statement doesn't appear to be very well specified. Are you trying to enforce that say rootdir2 is a subset of rootdir1? - assuming that file structure is the same underneath both rootdirs? diff will compare two single files together. So now the problem appears to be how to select the two files to compare against?

        There are a number of recent posts about File::Find. Do a super search on that.

        Ah, so you want something else that what you first described. Hence, instead of using just diff, you should use diff -r, as in:
        diff -r directory1 directory2
        That's much faster than figuring out how to use File::Find, and doing all the comparison yourself.
Re: Bitwise comparision of files
by Eliya (Vicar) on Feb 28, 2012 at 18:41 UTC

    Here's a non-Perl variant.

    In directory 1, run

    $ find . -type f -print0 | xargs -0 md5sum >checksums

    to create a list of checksums.  Then, form within directory 2, verify the list of checksums

    $ md5sum -c checksums 2>&1 | grep -q FAILED\$ && echo FAIL

    (The advantage of using the checksumming technique is that the directories don't need to be accessible from the same machine (as a direct block-by-block comparison would require). You only need to copy the checksums file.)

    This assumes that directory 2 isn't a subdirectory of directory 1.  Also, you haven't specified what is supposed to happen, if there are additional files in directory 2, which are not present in directory 1  (they would be ignored by this approach).

      I'd use diff -r directory1 directory2.

        As I mentioned in parentheses, the advantage of using the checksumming technique is that the directories don't need to be accessible from the same machine.

        Whether that matters here, I don't know, but I figured it might perhaps be useful to someone, sometime.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://956715]
Approved by Marshall
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2014-08-30 11:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (293 votes), past polls