ChuckularOne has asked for the wisdom of the Perl Monks concerning the following question:
I have two hashes (each approx 57,000 elements) that should be nearly identical, but finding the tiny differences is very important.
Any Ideas?
Your humble servant.
-Chuck
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: diff of two hashes.
by ZZamboni (Curate) on May 12, 2000 at 19:21 UTC | |
If the two hashes have the same keys and you want to see which elements have different values, you could use something like this (assuming the hashes contain strings, change the comparison as necessary):
--ZZamboni | [reply] [d/l] [select] |
Re: diff of two hashes.
by Russ (Deacon) on May 12, 2000 at 20:59 UTC | |
I like nuance's idea (above). I now store undef as the value when a key is missing from one hash. See nuance's description above for an explanation of the "return" values. </Updated> Here's some punctuation for you: (This is a short, concentrated way to do it - only 4 lines) A couple points to note:
Enjoy! Russ | [reply] [d/l] |
RE: diff of two hashes.
by turnstep (Parson) on May 12, 2000 at 19:16 UTC | |
It destroys the hash, and does not check for keys that are in two but not one. For that, perhaps something like this:
| [reply] [d/l] [select] |
RE: diff of two hashes.
by nuance (Hermit) on May 12, 2000 at 20:44 UTC | |
The first foreach creates a hash that contains all of the keys that are present in hash one and hash two who's contents dont match. It's data element is a list where $differences{$_}[0] is the data from hash one and $differences{$_}[1] is the data from hash two. It also has an entry for each key in hash one that does not appear in hash two, where $differences{$_}[1] is undefined. The second foreach performs the inverse using the keys of hash two. except that when a key in hash two is not present in hash one, $differences{$_}[0] is undefined and $differences{$_}[1] contains the data from hash two. When it is complete %differences contains 3 types of record:
Nuance
| [reply] [d/l] |
Re: diff of two hashes.
by snowcrash (Friar) on May 12, 2000 at 23:14 UTC | |
sc | [reply] [d/l] |
Re: diff of two hashes.
by johannz (Hermit) on May 12, 2000 at 21:10 UTC | |
Two major parts to the compareHashes subroutine.
| [reply] [d/l] |
Re: diff of two hashes.
by Maqs (Deacon) on May 12, 2000 at 19:37 UTC | |
This works much faster by all means (I used it for Apache logs proccessing) These are not native perl functions, but the value of perl, among others is its flexibility and ability to be integreated with other programs -- With best regards Maqs. | [reply] |
RE: diff of two hashes.
by lhoward (Vicar) on May 12, 2000 at 20:28 UTC | |
One option is iterate through both hashes in-sync (much like the "merge" step in a mergesort) and spit out any differences. What I present below is meant as more of an algorithm than an actual implementation (though it does work). It could be tweaked quite a bit in actual implementation to get much better performance. This is probably not the best perl implementation, but it is a good general-purpose algorithm for doing "diffrence of lists/difference of hashes".
| [reply] [d/l] |