Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Fingerprinting text documents for approximate comparison

by FitTrend (Pilgrim)
on Mar 24, 2005 at 16:26 UTC ( #442098=note: print w/ replies, xml ) Need Help??


in reply to Fingerprinting text documents for approximate comparison

A wacky idea may be to dump these changed files to a Subversion repository (source control system) using its command line functions. Then you can use perl to extract and perform DIFFs on these files to see what changes have been made to them (however small they are).

This may minimize the amount of code you need to manage by relying on the capability of this system

Alternatively (and possibly more fun), there are modules that perform DIFFs on files on CPAN. What comes to mind is TEXT::DIFF.


Comment on Re: Fingerprinting text documents for approximate comparison
Re^2: Fingerprinting text documents for approximate comparison
by Mur (Pilgrim) on Mar 24, 2005 at 18:29 UTC
    Ack. No, not in the scope of what I'm talking about. I have thousands of these per day, and I don't want to compare every one to every other one.
    --
    Jeff Boes
    Database Engineer
    Nexcerpt, Inc.
    vox 269.226.9550 ext 24
    fax 269.349.9076
     http://www.nexcerpt.com
    ...Nexcerpt...Connecting People With Expertise

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://442098]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2014-12-20 15:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (96 votes), past polls