in reply to
Module for Approximate File Comparison
A pretty neat and simple method is one outlined by Zaxo in Re: similar texts !?. Basically, you measure how much it helps a compression algorithm to concatenate the two files together, compared to when you compress them independently.
If you think of a compression algorithm as a mild approximation to Shannon entropy, then this approach is essentially computing the corresponding approximation of
mutual information (normalized to the range 0.5 - 1), which is the intuitive concept you seem to be looking for.