Perl: the Markov chain saw | |
PerlMonks |
Re: Fingerprinting text documents for approximate comparisonby johndageek (Hermit) |
on Mar 24, 2005 at 19:06 UTC ( [id://442166]=note: print w/replies, xml ) | Need Help?? |
I would look at creating a fingerprint file for each document (you will need to refine the parameters you use).
In this file I would put perhaps: You can either use your current checksum, or create a checksum on the fingerprint files. use similar checksums to select fingerprint files to compare, those fingerprints that are within a tolerance you set would be deemed matches. Jsut my 2 cents worth, good luck! <!-- Enjoy!
In Section
Seekers of Perl Wisdom
|
|