in reply to Module for Approximate File Comparison
A pretty neat and simple method is one outlined by Zaxo in Re: similar texts !?. Basically, you measure how much it helps a compression algorithm to concatenate the two files together, compared to when you compress them independently.
If you think of a compression algorithm as a mild approximation to Shannon entropy, then this approach is essentially computing the corresponding approximation of mutual information (normalized to the range 0.5 - 1), which is the intuitive concept you seem to be looking for.
blokhead
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Module for Approximate File Comparison
by atcroft (Abbot) on Sep 24, 2007 at 18:44 UTC | |
by salva (Canon) on Sep 24, 2007 at 19:46 UTC | |
by atcroft (Abbot) on Sep 24, 2007 at 23:46 UTC |
In Section
Seekers of Perl Wisdom