<?xml version="1.0" encoding="windows-1252"?>
<node id="667089" title="Re: how to get rid of cut-and-paste sins?" created="2008-02-08 17:28:15" updated="2008-02-08 12:28:15">
<type id="11">
note</type>
<author id="616540">
moritz</author>
<data>
<field name="doctext">
To put your question in different terms: How do I detect plagiarism, even if it's done by me? ;-)

&lt;p&gt;There's a paper on that topic [http://theory.stanford.edu/~aiken/publications/papers/sigmod03.pdf|here], it's about a program called [http://theory.stanford.edu/~aiken/moss/|moss].

&lt;p&gt;There are other [http://www.redhillconsulting.com.au/products/simian/|code similarity analyzers] out there, it's surely worth a look.

&lt;p&gt;If you want to detect blatant copy &amp; paste a simple similarity search should be enough, for anything more elaborate you need a parse tree or an AST on which you can perform similarity checks.

&lt;p&gt;There's much research done on that topic, you should fine some useful papers and implementations with your favorite search engine ;-)</field>
<field name="root_node">
667084</field>
<field name="parent_node">
667084</field>
</data>
</node>
