Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

File Comparison

by sunadmn (Curate)
on Aug 20, 2003 at 18:28 UTC ( #285249=perlquestion: print w/replies, xml ) Need Help??
sunadmn has asked for the wisdom of the Perl Monks concerning the following question:

Good day Monks I am trying to find out a way to compare three seporate files and build an output of the diferances between the three. I have looked at the doc for File::Compare and it looks to do only two files, also I have gone through cpan to try to find an array compare and the only one I found that seemed to suite would also do only two elements (files). Any help or just a point in the correct direction would be great. Thanks all -Stephen

Replies are listed 'Best First'.
Re: File Comparison
by liz (Monsignor) on Aug 20, 2003 at 18:37 UTC
    Eh... isn't the difference between 3 files equal to the difference between::
    • 1 and 2
    • 1 and 3
    • 2 and 3

    ? And isn't that what File::Compare can do already (in three runs)? Or am I missing something?


Re: File Comparison
by tcf22 (Priest) on Aug 20, 2003 at 18:46 UTC
    How about looping through each file. Comparing it with ones after it in the array. You can even load ones that were previously compared, so you get a structure containing what you want.

    use strict; use Data::Dumper; use File::Compare; my @files = qw( file0 file1 file2 ); my @results; foreach my $i (0..$#files){ $results[$i] = []; #Already compared these foreach my $j(0..($i-1)){ $results[$i][$j] = $results[$j][$i]; } #Same File $results[$i][$i] = 1; #New Comparisons foreach my $j(($i+1)..$#files){ $results[$i][$j] = compare($files[$i],$files[$j]); } } print Dumper \@results;
    So $results[0][1] is the result of a comparison between files file0 and file1, and $results[0][2] is the result of a comparison between files file0 and file2.
      that's close to what I am trying to do, but not exactly. Let me give you the example of what I want to achieve I have a parse script written that looks through a log file and then builds three seporate files from the output these files actually hold Bind transfer stats. Now what I would like to do is parse the three files and do like a sdiff on all three, but sdiff will only do two files. In the end I want to have a list of lines that do not exsist in all files and what file they are missing in. Does that make any sense??
        If you have a parse script writen already why the extra step of going to 3 files, why not inspect the data and output the real result in one step?


        Sounds like an interesting problem, but I still can't quite picture the data and the result you want. These questions might clear things up for me:

        Are the lines guaranteed to be unique?

        Does the order of the lines matter as it does in diff?

        If I see a line "XYZ" in file 1 and "XYZ" in file 2, and "XYZ" in file 3, are these the same line no matter where they show up in the respective files?

        How big are the files? Would it be feasible to load them all into memory at the same time?

        Is it ok to sort the files before doing the comparison or does your output need to be in a specific order?

        Pretend letters are lines. What should be the output if the following are the contents of the three files?

        file 1: A B C D E G file 2: B A D E G H file 3: A B D E G I

        -- Eric Hammond

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://285249]
Approved by tcf22
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2017-04-26 03:43 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (468 votes). Check out past polls.