|
|
|
Your skill will accomplish what the force of many cannot |
|
| PerlMonks |
Re: Module for Approximate File Comparisonby blokhead (Monsignor) |
| on Sep 24, 2007 at 13:19 UTC ( #640714=note: print w/ replies, xml ) | Need Help?? |
|
A pretty neat and simple method is one outlined by Zaxo in Re: similar texts !?. Basically, you measure how much it helps a compression algorithm to concatenate the two files together, compared to when you compress them independently. If you think of a compression algorithm as a mild approximation to Shannon entropy, then this approach is essentially computing the corresponding approximation of mutual information (normalized to the range 0.5 - 1), which is the intuitive concept you seem to be looking for. blokhead
In Section
Seekers of Perl Wisdom
|
|
||||||||||||||||||||||||||