|
|
|
Your skill will accomplish what the force of many cannot |
|
| PerlMonks |
RE: RE: RE: Re: HTML Document Comparisonby mdillon (Priest) |
| on Sep 13, 2000 at 20:49 UTC ( #32296=note: print w/ replies, xml ) | Need Help?? |
|
so, now that i've made a few changes, i think you could do this by keeping a running total of the hunk sizes and then comparing it to the number of lines in either the original or the revision. however, i'm not really sure what would be an appropriate heuristic. perhaps showing both $total_deletions / $original_lines and $total_additions / $revision_lines. you could also use LCS instead of diff and compare the size of the LCS (Longest Common Subsequence) to the size of the original or revised token list. this would allow you to say roughly "Revision is 80% similar to original" if @LCS / @original == 0.8. i have updated my old post to include this heuristic.
In Section
Seekers of Perl Wisdom
|
|
||||||||||||||||||||