As noted, there has been some fairly extensive research into "Copy Paste Detection" (Side note: Alex Aiken was by far my favorite professor in College)
The big problem with a lot of naive approaches to copy paste detection is that it's very rare for whole chunks of code to be duplicated verbatim ... frequently one version gets modified, variable names are changed, lines are inserted, etc.
The PMD project (a Java corollary for Perl::Critic) has a CPD sub project that has gone through several iterations and algorithms. It's implemented in Java, and doesn't seem to currently support Perl - but it is free and adding new language support is (in theory) rally straightforward if you know some Java and implement a simple Tokenizer Interface.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||