http://www.perlmonks.org?node_id=115511


in reply to (OT) Rewriting, from scratch, a huge code base

I believe that this paper is the one that may have motivated Understanding *IS* Better by chromatic. He certainly agreed with it.

That said, I disagree with the thesis. I do not believe that old code is good code. I do not believe that all code is worth rescuing. There are times a rewrite is necessary. Here are a few reasons that I have done rewrites in the past and would do them again:

  1. The code is scattered across several languages and rewriting in just one would allow more consistent argument processing and error handling. For instance I once removed a lot of old Expect scripts with Perl for this reason.
  2. The code relies on interfaces that are fundamentally broken. For instance code built on Text::CSV is unable to handle embedded newlines. Given a system which processes csv files, and needs to handle embedded newlines, the code has to be fixed and the broken module removed.
  3. The code is of sufficiently low quality that fixing it is harder than replacing it. IBM found in the 80's that when they tracked bugs, something like 10% of the components were most of the bugs. Rewriting those components (once identified) from scratch significantly reduced overall bug counts.
  4. The code is full of hard-coded information (eg paths) that you need to track down and replace with something more flexible. A particularly good opportunity for this is when you need to move it from one machine to another. Choices, choices, recreate the environment that it needs and dependencies that are not documented, or replace its functionality with a version that is more portable?
  5. The system depends on a component that you are trying to eliminate. Fairly often a system will have two parts that do pretty much the same thing. Life would be easier if you were only using one of them. (Less to remember, easier to teach people how things work, etc.) In the process of doing that, parts that use the losing component will be replaced as opportunity permits.
An excellent example of code that needed a major rewrite was Netscape 4's render engine. People say that Netscape 4 was rewritten because of performance issues. That isn't what I remember. What I remember is that Netscape 4 was rewritten because to support what people were trying to do it needed to be able to do incremental renders and incremental re-renders. This was not optional. This was required to implement large parts of the HTML spec. The hack used in the old engine was to re-render the whole page from scratch. The problem was that while this kind of worked in the easy cases, it was fundamentally broken. What people noticed is that it was slow. However it also meant that a Netscape page refused to render until you could place all of the pieces. A single slow image will not block an IE render. It would block an old Netscape render of a page until it got enough of the image to figure out how big it is. Also it was fragile. For instance you can take Netscape 4, do a render with a dynamically created page, resize, and you don't have a rendered page any more! Tracking this kind of thing down is no fun at all.

In other words the primary reason for rewriting the renderer in Netscape 4 was not performance, it is that the old render engine tied them to an inherently buggy API that was biting them over and over again. The performance was a visible second issue. Of course rewriting the rest of Netscape went beyond that...

However he is right that the worst way to do a rewrite is to sit down and start writing something completely new from scratch. Instead I like to work as I suggested in Re (tilly) 1: Best way to fix a broken but functional program?. Decide on an overall flow of a new design. Pull some of the existing mess out and make it a shell around the new design. For instance if you need a new rendering engine, then release something with the old, spec out the new engine. Then start incrementally writing the new, as you go scooping out from the old. It may take longer, but you don't lose the existing knowledge. You don't stop yourself from delivering the product. And if you do it right, at some point the old becomes a small shell that you can kill when you get the right moment.

Now some may tell me that I just invented refactoring. I disagree. The fundamental principle of refactoring is to incrementally rewrite code through a series of transformations. This is a technique for writing a new project from scratch, while analyzing the old code very carefully for important things that it had, and with the plan of throwing the old code away when you can. Conceptually refactoring is the process of transforming cr*p into soil. This is a process of incremental replacement laying the ground for apparently catastrophic replacement once the new foundation is there.

Now this doesn't mean that I don't think that refactoring is a great idea. It is. But trying to preserve something just because it is already written is a mistake in my books.