Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: general advice finding duplicate code

by GrandFather (Saint)
on Jun 21, 2011 at 05:54 UTC ( [id://910682]=note: print w/replies, xml ) Need Help??


in reply to general advice finding duplicate code

Interesting, I started solving a very similar problem for planetscape some time ago. In her case she was wanting to refactor a web site where there were large chunks of duplicated HTML. The general approach was to normalise the HTML then extract chunks of some minimum size and populate a hash using the chunks as a key and adding the file location of each chunk to a list stored in the hash element. The interesting part is to then use the matched chunks as seed points and grow the match area to encompass as large a common region as sensible.

With the HTML matching choosing "sensible" was something of a trade off. As the common region was increased the number of places that matched the region tended to reduce. It may be that in your case if the code rally has just been copied around without change that the regions are pretty well defined.

My guess is that for something small like 55K LOC the technique would work quite well and in a timely fashion. Probably you don't need to worry so much about the normalise step.

True laziness is hard work
  • Comment on Re: general advice finding duplicate code

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://910682]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2024-04-19 20:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found