|Welcome to the Monastery|
Re: how to get rid of cut-and-paste sins?by hossman (Prior)
|on Feb 08, 2008 at 23:58 UTC||Need Help??|
As noted, there has been some fairly extensive research into "Copy Paste Detection" (Side note: Alex Aiken was by far my favorite professor in College)
The big problem with a lot of naive approaches to copy paste detection is that it's very rare for whole chunks of code to be duplicated verbatim ... frequently one version gets modified, variable names are changed, lines are inserted, etc.
The PMD project (a Java corollary for Perl::Critic) has a CPD sub project that has gone through several iterations and algorithms. It's implemented in Java, and doesn't seem to currently support Perl - but it is free and adding new language support is (in theory) rally straightforward if you know some Java and implement a simple Tokenizer Interface.