so, now that i've made a few changes, i think you could do
this by keeping a running total of the hunk sizes and then
comparing it to the number of lines in either the original
or the revision. however, i'm not really sure what would be
an appropriate heuristic. perhaps showing both
$total_deletions / $original_lines and
$total_additions / $revision_lines.
you could also use LCS instead of diff and compare the
size of the LCS (Longest Common Subsequence) to the size
of the original or revised token list. this would allow
you to say roughly "Revision is 80% similar to original"
if @LCS / @original == 0.8.
i have updated my old post to include this heuristic.
Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
Want more info? How to link or
or How to display code and escape characters
are good places to start.