http://www.perlmonks.org?node_id=11284

ChuckularOne has asked for the wisdom of the Perl Monks concerning the following question:

Is there an easy way to do a diff of two hashes?

I have two hashes (each approx 57,000 elements) that should be nearly identical, but finding the tiny differences is very important.
Any Ideas?

Your humble servant.
-Chuck

Replies are listed 'Best First'.
Re: diff of two hashes.
by ZZamboni (Curate) on May 12, 2000 at 19:21 UTC
    If you want to find the keys in one hash that are not in the other, you can use this (from the Perl Cookbook, 5.11):
    my @this_not_that=(); foreach (keys %hash1) { push(@this_not_that, $_) unless exists $hash2{$_}; }
    If the two hashes have the same keys and you want to see which elements have different values, you could use something like this (assuming the hashes contain strings, change the comparison as necessary):
    @different=grep { $hash1{$_} ne $hash2{$_} } keys %hash1; foreach (@different) { print "hash1{$_}: $hash1{$_}\n"; print "hash2{$_}: $hash2{$_}\n--\n"; }

    --ZZamboni

RE: diff of two hashes.
by turnstep (Parson) on May 12, 2000 at 19:16 UTC
    Hrmmm...are the keys all the same, and do you need to know which hash the differences came from? A quick solution that pops to mind is:
    for $x (keys %one) { delete $one{$x} if $two{$x} eq $one{$x}; } ## Show non-matching keys: for $x (keys %one) { print "$x ($one{$x})\n"; }

    It destroys the hash, and does not check for keys that are in two but not one. For that, perhaps something like this:

    for $x (keys %one) { print "DIFF: $x\n" if $one{$x} ne $two{$x}; } for $x (keys %two) { print "DIFF2: $x\n" unless $one{$x}; }
Re: diff of two hashes.
by Russ (Deacon) on May 12, 2000 at 20:59 UTC
    <Updated>
    I like nuance's idea (above). I now store undef as the value when a key is missing from one hash. See nuance's description above for an explanation of the "return" values.
    </Updated>

    Here's some punctuation for you:

    (This is a short, concentrated way to do it - only 4 lines)

    # Keys 7 and 8 are unique, keys 2,4 and 6 have different values my %R = (1=>1, 2=>2, 3=>3, 4=>4, 5=>5, 6=>6, 7=>7); my %S = (1=>1, 2=>'b', 3=>3, 4=>'d', 5=>5, 6=>'f', 8=>8); # 1) Keys in %R which are not in %S # 2) Keys in %S which are not in %R # 3) Keys in both which have different values my %Diffs = ((map(($_ => [$R{$_}, undef]), grep {not exists $S{$_}} k +eys %R)), (map(($_ => [undef, $S{$_}]), grep {not exists $R{$_}} k +eys %S)), (map(($_ => [$R{$_}, $S{$_}]), grep {exists $S{$_} and $R{$_} ne $S{$_}} keys %R)) +); # Print out what we found for (sort keys %Diffs){ print $_, ': ', join(', ', map(defined $_ ? $_ : 'undef', @{$Diffs{$ +_}})), "\n"; }
    A couple points to note:
    • if your values are numeric, change the 'ne' in the third Diffs section
    • we use an anonymous array ref to store the values in %Diffs, so remember to dereference it when you use %Diffs

    Enjoy!

    Russ

Re: diff of two hashes.
by snowcrash (Friar) on May 12, 2000 at 23:14 UTC
    i would do something like this, where %a and %b are the hashes to compare:
    foreach (sort keys %{{%a, %b}}) { if (!exists($b{$_})) { print "$_ only in a\n" } elsif (!exists($a{$_})) { print "$_ only in b\n" } elsif ($a{$_} ne $b{$_}) { print "$_: values in a and b differ\n" + } }
    sc
RE: diff of two hashes.
by nuance (Hermit) on May 12, 2000 at 20:44 UTC
    I would probably use something like this.

    The first foreach creates a hash that contains all of the keys that are present in hash one and hash two who's contents dont match. It's data element is a list where $differences{$_}[0] is the data from hash one and $differences{$_}[1] is the data from hash two. It also has an entry for each key in hash one that does not appear in hash two, where $differences{$_}[1] is undefined.

    The second foreach performs the inverse using the keys of hash two. except that when a key in hash two is not present in hash one, $differences{$_}[0] is undefined and $differences{$_}[1] contains the data from hash two. When it is complete %differences contains 3 types of record:

    • key appears in both hashes, but the data is different.
    • key appears in first hash, data in $differences{$_}[0], $differences{$_}[1 ] is undefined.
    • key appears in second hash, data in $differences{$_}[1], $differences{$_}[0 ] is undefined.
    my %differences = (); foreach (keys %hash1) { $differences{$_}= [$hash1{$_}, $hash2{$_}] if $hash1{$_} ne $hash2 +{$_}; }; foreach (keys %hash2) { $differences{$_}= [$hash1{$_}, $hash2{$_}] if $hash1{$_} ne $hash2 +{$_}; };

    Nuance

Re: diff of two hashes.
by Maqs (Deacon) on May 12, 2000 at 19:37 UTC
    This is not rather the native perl silution, but i solved such problem by writing hashes to plain text files and comparing them using sort and (if needed) uniq system functions and then compared two files with cmp (assuming you are running *nix systems).
    This works much faster by all means (I used it for Apache logs proccessing)
    These are not native perl functions, but the value of perl, among others is its flexibility and ability to be integreated with other programs
    --
    With best regards
    Maqs.
Re: diff of two hashes.
by johannz (Hermit) on May 12, 2000 at 21:10 UTC
    My couple of cents worth.

    Two major parts to the compareHashes subroutine.

    1. Build a Hash of all keys, with a count of how many hashes that key appears in
    2. Go though list of keys, display whether in one hash, or if different between the two hashes.
    #! /usr/bin/perl my $skipOdds = 1000; my $randomOdds = 1000; my $rHash1 = createHash(57000); my $rHash2 = createHash(57000); compareHashes($rHash1, $rHash2); exit; sub createHash { my $size = shift || 100; my $rHashRef = shift || {}; my $key = ''; my $value = ''; for( my $current = 0; $current < $size; $current++ ) { # If rand number comes up 0, skip. next unless int(rand($skipOdds)); $value = $key = sprintf('%X', $current); unless (int(rand($randomOdds))) { $value = sprintf('%X', int(rand($size))); }; $rHashRef->{$key} = $value; } return $rHashRef; #Allows use as RV for assignment; } sub compareHashes { my $rHash1 = shift || {}; my $rHash2 = shift || {}; my %keys; my $currentKey; map {$keys{$_}++} keys(%$rHash1); map {$keys{$_}++} keys(%$rHash2); foreach $currentKey (sort(keys(%keys))) { if ($keys{$currentKey} == 2) { # In both hashes, let's see if it's the same. if ($rHash1->{$currentKey} ne $rHash2->{$currentKey}) { print "Key $currentKey is different\n"; } } else { # Only in one hash, let's see which one. if (exists($rHash1->{$currentKey})) { print "Key $currentKey only in Hash1\n"; } else { print "Key $currentKey only in Hash2\n"; } } } }
RE: diff of two hashes.
by lhoward (Vicar) on May 12, 2000 at 20:28 UTC
    Are you looking for different keys, keys/value pairs or both?

    One option is iterate through both hashes in-sync (much like the "merge" step in a mergesort) and spit out any differences. What I present below is meant as more of an algorithm than an actual implementation (though it does work). It could be tweaked quite a bit in actual implementation to get much better performance. This is probably not the best perl implementation, but it is a good general-purpose algorithm for doing "diffrence of lists/difference of hashes".

    my @k1=sort keys %h1; my @k2=sort keys %h2; my $k1; my $k2; while((scalar @k1 > 0)&&(scalar @k2 > 0)){ if(!defined $k1){ $k1=shift @k1; } if(!defined $k2){ $k2=shift @k2; } if($k1 eq $k2){ if($h1{$k1} ne $h2{$k2}){ # .. keys match but contents dont } undef $k1; undef $k2; }elsif($k1 lt $k2){ # .. key in k1 and not in k2 undef $k1; }else{ # .. key in k2 and not in k1 undef $k2; } } if(defined $k1){ foreach ($k1,@k1){ # .. key in k1 and not in k2 } } if(defined $k2){ foreach ($k2,@k2){ # .. key in k2 and not in k1 } }