Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

RE: RE: RE: Re: HTML Document Comparison

by mdillon (Priest)
on Sep 13, 2000 at 20:49 UTC ( [id://32296]=note: print w/replies, xml ) Need Help??


in reply to RE: RE: Re: HTML Document Comparison
in thread HTML Document Comparison

so, now that i've made a few changes, i think you could do this by keeping a running total of the hunk sizes and then comparing it to the number of lines in either the original or the revision. however, i'm not really sure what would be an appropriate heuristic. perhaps showing both $total_deletions / $original_lines and $total_additions / $revision_lines.

you could also use LCS instead of diff and compare the size of the LCS (Longest Common Subsequence) to the size of the original or revised token list. this would allow you to say roughly "Revision is 80% similar to original" if @LCS / @original == 0.8.

i have updated my old post to include this heuristic.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://32296]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (5)
As of 2024-03-19 11:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found