http://www.perlmonks.org?node_id=115456

Wanna learn software management? Go to Joel on Software. Now. He has tremendous insight on software development and management and I have a lot of respect for what he writes. That being said, I had an issue with his article about never rewriting a large software application. Rather than sum up his excellent arguments, I suggest you read his article before replying to this node (and you should read it even if you don't want to reply).

I should also mention that this is a follow-up to Rewriting a large code base. Thanks to the information garnered in that node, we have a much better plan of attack for the rewrite, but I'm hoping to abuse Monks for even more perspective :) As a result, feel free to not vote for this node.

Despite having read his article, I still called for (and got approval on) a rewrite for the code base. What follows is a brief description of the code and my reasoning for a rewrite. The code is a Web-based product that allows for complete manufacture to retailer to consumer product ordering and distribution with integrated inventory management and drop-shipping programs. Without going into detail, it's huge, complex, and the specialized industries that we cater to are very pleased with its potential. I say "potential" because it's not living up to it. The software works well so long as no one does anything unexpected (and it's a Web-based application that relies on cookies and Javascript for much of its functionality!).

Here's why I reluctantly have asked for a rewrite:

  1. Massive system with massive security holes. I've plugged as many as I can, but I know there are probably more out there.
  2. The database was poorly designed and does not properly support our business rules. Further, the current schema cannot support changes that our customers have insisted upon.
  3. There is extensive use of global variables which has made the system inherently difficult to maintain and has driven up our maintenance costs considerably.
  4. This system was never tested. You will not find a single line of test code and because the code is not orthogonal, fixing one bug often causes all sorts of other bugs to crop up. Since there is no test code, finding other bugs is trial and error.
  5. This system is poorly documented. Often, what little API documentation there is does not accurately describe the API in question. Further, we have many, many similar objects but with radically different APIs.
  6. Extensive creation and use of global temp tables (which are rarely deleted) in our database has caused the database to crash frequently. Because these global temp tables store information for so many of the global variables, trying to pull them out has proved extremely difficult.
  7. Many tricky programming features were added because the IT director (who is no longer with us) thought they were "neat".
  8. We need to port the code to mod_perl to increase performance, but this is literally impossible due to the code design.

I did not design the code base and was brought in to work on it near the end of the project. I have built up knowledge of its functionality only through painful trial and error. We do not have a single person in-house who completely understands the system and after reviewing our options, we felt it would be more cost-effective, in the long run, to rewrite the code base from scratch.

All code would have unit tests written *prior* to coding. Automated test suites would run against all builds to validate functionality. All undocumented code is automatically rejected. User and developer manuals would be written concurrent with code generation. Tests would be written for every bug to ensure that we never manually catch another bug. Further, no bug can be marked as "Fixed" in the bug database (we use Bugzilla, darn it) unless a test has been written for it.

We currently have at least one hundred clients who have asked how soon they can use our system (and another 700 who have expressed high interest). Unfortunately, we can't support that many clients. There is an upper limit of only four or five clients before the system becomes unuseable! Rewriting the code base is extreme and we might lose some clients, but those are clients we can't even accept given our current limitations.

In short, I do think there are times that a large, working application needs to be rewritten from scratch (we can't even reuse any old code due to extensive side effects).

I'd be interested in hearing discussion about this. Have any monks had similar experiences (either with or without the rewrite)? How did you deal with them and what sort of issues did you face? If you didn't rewrite such a large system, how would you go about fixing a problem this huge?

Cheers,
Ovid

Vote for paco!

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re (tilly) 1: (OT) Rewriting, from scratch, a huge code base
by tilly (Archbishop) on Sep 29, 2001 at 01:24 UTC
    I believe that this paper is the one that may have motivated Understanding *IS* Better by chromatic. He certainly agreed with it.

    That said, I disagree with the thesis. I do not believe that old code is good code. I do not believe that all code is worth rescuing. There are times a rewrite is necessary. Here are a few reasons that I have done rewrites in the past and would do them again:

    1. The code is scattered across several languages and rewriting in just one would allow more consistent argument processing and error handling. For instance I once removed a lot of old Expect scripts with Perl for this reason.
    2. The code relies on interfaces that are fundamentally broken. For instance code built on Text::CSV is unable to handle embedded newlines. Given a system which processes csv files, and needs to handle embedded newlines, the code has to be fixed and the broken module removed.
    3. The code is of sufficiently low quality that fixing it is harder than replacing it. IBM found in the 80's that when they tracked bugs, something like 10% of the components were most of the bugs. Rewriting those components (once identified) from scratch significantly reduced overall bug counts.
    4. The code is full of hard-coded information (eg paths) that you need to track down and replace with something more flexible. A particularly good opportunity for this is when you need to move it from one machine to another. Choices, choices, recreate the environment that it needs and dependencies that are not documented, or replace its functionality with a version that is more portable?
    5. The system depends on a component that you are trying to eliminate. Fairly often a system will have two parts that do pretty much the same thing. Life would be easier if you were only using one of them. (Less to remember, easier to teach people how things work, etc.) In the process of doing that, parts that use the losing component will be replaced as opportunity permits.
    An excellent example of code that needed a major rewrite was Netscape 4's render engine. People say that Netscape 4 was rewritten because of performance issues. That isn't what I remember. What I remember is that Netscape 4 was rewritten because to support what people were trying to do it needed to be able to do incremental renders and incremental re-renders. This was not optional. This was required to implement large parts of the HTML spec. The hack used in the old engine was to re-render the whole page from scratch. The problem was that while this kind of worked in the easy cases, it was fundamentally broken. What people noticed is that it was slow. However it also meant that a Netscape page refused to render until you could place all of the pieces. A single slow image will not block an IE render. It would block an old Netscape render of a page until it got enough of the image to figure out how big it is. Also it was fragile. For instance you can take Netscape 4, do a render with a dynamically created page, resize, and you don't have a rendered page any more! Tracking this kind of thing down is no fun at all.

    In other words the primary reason for rewriting the renderer in Netscape 4 was not performance, it is that the old render engine tied them to an inherently buggy API that was biting them over and over again. The performance was a visible second issue. Of course rewriting the rest of Netscape went beyond that...

    However he is right that the worst way to do a rewrite is to sit down and start writing something completely new from scratch. Instead I like to work as I suggested in Re (tilly) 1: Best way to fix a broken but functional program?. Decide on an overall flow of a new design. Pull some of the existing mess out and make it a shell around the new design. For instance if you need a new rendering engine, then release something with the old, spec out the new engine. Then start incrementally writing the new, as you go scooping out from the old. It may take longer, but you don't lose the existing knowledge. You don't stop yourself from delivering the product. And if you do it right, at some point the old becomes a small shell that you can kill when you get the right moment.

    Now some may tell me that I just invented refactoring. I disagree. The fundamental principle of refactoring is to incrementally rewrite code through a series of transformations. This is a technique for writing a new project from scratch, while analyzing the old code very carefully for important things that it had, and with the plan of throwing the old code away when you can. Conceptually refactoring is the process of transforming cr*p into soil. This is a process of incremental replacement laying the ground for apparently catastrophic replacement once the new foundation is there.

    Now this doesn't mean that I don't think that refactoring is a great idea. It is. But trying to preserve something just because it is already written is a mistake in my books.

      Good catch. This is precisely the article I had in mind. Yes, I unashamedly think that rewriting from scratch is the wrong kind of laziness.

      Take a look at File::Find sometime. What a mess. My refactored version passes all of the tests (at least on a Unixy system) and is *half* the code of the original. How long would have it taken to rewrite that from scratch? Far longer than it took to make incremental changes and get it into a better working order. (If anyone's interested and can debug on a different platform, let me know.)

      Mozilla's a great example. Also consider Perl 6. It's been a year and a couple of months, and with all of the brilliant ideas and hard work and smart people, we've got a handful of design documents, a virtual machine that does some math and can print things out, and (admittedly) saner bytecode. This, while we're still fixing bugs and trying to finish the test suite for Perl 5! Do the internals need to be improved? Yeah. Will Perl 6 deliver? Undoubtedly. Why throw away 35 megabytes of source code if Perl 6 will act 95% the same as Perl 5?

      Realistically speaking, if you don't have design documents or tests or even a good idea of what the software is doing, how confident can you be that your rewrite will do what it's supposed to do, at the same level? (If it's not working period, that's a different story.)

      And, yeah, your process sure smells a lot like "improving the design of existing code." I don't know of any Refactoring gurus who'd claim that you should have x% of the original code left when you're done refactoring.

      My point (and maybe Joel would agree) is that though maintenance isn't the fun part of programming, you very rarely have (or should take) the luxury of skipping it.

      Ovid, you'll have to write tests sometime. My recommendation is to do it now, based on the old code. You'll get a handle for what it's really supposed to be doing, you'll immediately see how to fix it, and you'll grow as a programmer very quickly.

      ©

        Colour me unconvinced.

        First take the Netscape/Mozilla project. The article addressed that one and said that the decision to rewrite was an unmitigated disaster. And implies strongly that if they had decided to work with the existing code-base, they would have had better results. Well the way that I remember it, they were getting eaten alive by IE, and development was crippled by having to deal with and work around layers of bug fixes on bug fixes. The fact that one set of nightmares came true doesn't mean that the other would not have. Hindsight isn't 20/20. Rather it is speculation with the comfort of knowing you will never find out if you are wrong.

        Take next Perl 6. Perl 6 had many major goals. The most important was to reinvigorate the Perl community. Others included making it easier to get into the internals and easier to port to different platforms (eg the JVM, C#, or a more aggressively optimized binary). Note that Perl 5 was not doing too well at these tasks despite much energy and interest. Well Perl 6 has done quite well at the first, and I am fairly confident that it will be able to succeed in the others. It just won't do it on an aggressive schedule.

        But I said I don't like to argue from failure. How about a success? Take a look at perl. Scan for the words "version 5". Perl 5 is a complete rewrite of Perl 4. According to Joel that was a horrible mistake, and Perl 5 was bound to fail. Didn't fail that I see. In fact when I take a look at what it resulted in, I don't think that features like lexical scope, removing 2/3 of the reserved keywords, adding references, etc, etc, etc would have happened in the same timeframe doing incremental refactoring. Furthermore I give Larry Wall due credit, he has probably been writing influential free software projects longer than both of us have been writing software combined. Given his trail of successes, when he thinks a rewrite is doable, it probably is. If he thinks that it is a good idea to get where he wants Perl to get, well he is the one whose vision got it where it was.

        Now this is not to say that refactoring is a bad idea. When it works, it works well. It is a useful tool. I am glad that it helped you on File::Find. But I think that most big projects can profitably use multiple modes. For instance the Linux project does both. Most of the time you do incremental ongoing development. But I think that ESR made the right choice when he made CML 2 a complete rewrite. Sometimes you incrementally adapt a component. Sometimes you replace it. You are doing something wrong if you need to replace big components very often.

        And one final thing. Ovid is dealing with a system, one of whose problems being that it had a bunch of features added without much rhyme or reason because an ex-employee thought they were "cool". It does not have a large user base. I don't think, therefore, that he should build tests based on the current behaviour, enshrining the misfeatures in tests. Rather he should do some research about how the system is actually used, and only test for what people use from it. Whether or not he rewrites from scratch, blindly refactoring based on the current behaviour will not solve one of the problems that he wants to solve. And, whether or not he rewrites, he should think about how to solve the business problem. Perl 4 did not stop development just because Larry was working on Perl 5. Perl 5 is not stopping active development just because Perl 6 is being worked on. Creating great software is one thing. But you need to survive to actually do it...

Re: (OT) Rewriting, from scratch, a huge code base
by Maclir (Curate) on Sep 28, 2001 at 23:34 UTC
    First ofall, ++Ovid for bring this (and related) article(s) to my attention. There are many good points in the article. If I could add my 2 cents worth, my first professional programming job involved a major code rewrite.

    <reminisce mode>The system was a large batch Fortran application that predicted telephone exchange growth for Australia's (then only) telephone company. The system hade grown over some 10 years to a deck of punch cards you could choke an elephant with, it took about one hour of processing on the CDC Cyber 7000 mainframe, produced some 500 pages of output, and cost $100 (in 1976 dollars) each time it was run. BUT the main problem was political - it was originally written by the engineer who now headed the branch - and even though the staff (his junior engineers) all agreed the program was a "crock of shite", none would publically go on the record and say that, let alone volunteer to rewrite it.

    I was there as a three month student engineer, and had no qualms about calling a spade a spade. Besides, it was a good project for someone like me. After spending a week going through the system, and realising there wer efour almost identical programs, each providing the same information in a different format, I spoke wiht the people who used the reports, and found all but one wre thrown out unread.

    Step one - new user requirements.

    Step two - rewrite, using operating system utilities to do things like sorting, not the primitave substution sort in Fortran.

    Step three - test

    step four - relase the new version - a fraction of the size, execution time less than 5 minutes, cost under $5.

    </reminisce mode>Moral - if your existing system is hopelessly broken, rewrite. Otherwise, try to improve. And only fix one thing at a time.

Re: (OT) Rewriting, from scratch, a huge code base
by ducky (Scribe) on Sep 28, 2001 at 22:35 UTC

    Ok, I read the article and I totally agree with it. Ovid, you should NOT completely rewrite a fully functional, in-production application. So go ahead and re-architect that broken, non-scaling app with the knowledge that you may fully adhere to Joel's statements... once it is a bug-fixed, reliable, documented piece of work. =) Not many, if any, of the arguments for just incremental code improvements he states apply to your inherited app.

    The way I'd go about not rewriting something like this is exactly how Joel states: moving code around while not affecting what it does until it can be segmented and replaced. But from what you say, a *lot* of this puppy was implemented incorrectly, undocumented, and the over-all architecture isn't sound.

    The few times I've had to rework systems like that, I've tried to rewrite some of the base and try to adapt the existing modules to work within that, then slowly work over each part, breaking them into more logical parts, cleaning up the code or just rewriting as necessary... but these were on the order of around 5k-10k lines total (including whitespace, comments, etc). Nothing HUGE, by any means. =/

    My $.02, but I feel like I'm just being a "yes" man. =P

    -Ducky

      I disagree with this. Very often, the bloat and bug-fix-but-not-really-cause-it's-now-unmaintainable creep ... that makes the application completely unworkable. I've taken applications that I could rework and I've taken applications that were completely unfeasible to rework and that rewriting was a matter of 2-3 weeks, most of that reading the horrid application cause there were no requirements. (And, yes, it was large ... before I got my paws into it.) And, I got rid of unnecessary misfeatures and streamlined the code and migrated it from Tk to CGI and sped it up 5-fold.

      So, don't say that rewriting is bad-horrible-evil-unclean. That's like saying "GOTO is bad in every instance". Wrong. It's bad in almost every instance. But, that's why they pay us the big bucks - to figure out that 1-100 instance where goto is not only not bad, but even suggested.

      ------
      We are the carpenters and bricklayers of the Information Age.

      Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

        So you say you don't agree when in your example you say the project was poorly implemented, poorly documented, and unmaintainable? This is exactly why a rewrite is justified. I'm also saying that if it is properly implemented, well documented, and maintainable then the arguments for a rewrite are rather thin - just refine what you've got.

        The article Ovid posted was essentially saying, "A rewrite is bad when the code a) works, b) is bug fixed, c) well documented, and d) the over-all design is sound." If the project is missing too many of those points, a rewrite is justified. How many of those points and to what degree are need to justify a rewrite are, like you said, why we get paid the big bucks =)

        -Ducky

Re: (OT) Rewriting, from scratch, a huge code base
by Sifmole (Chaplain) on Sep 28, 2001 at 22:23 UTC
    Ovid,

    Often times extremely large, complicated code bases don't start out that way. They "organically" grow to such nauseating behemoths over time and change requests. That said the one thing I would make sure I had an absolute grasp of before I began to rewrite the application is What is it supposed to do?.

    Since you have nobody around, and apparently little to no useable documentation, you run the risk of the documented requirements not being synchronized with the final requirements that the existing code attempted to satisfy.

    Just because I have been bitten by this type of thing once...

    Just my 2 cents worth (and that is probably all it is worth)

Re: (OT) Rewriting, from scratch, a huge code base
by dragonchild (Archbishop) on Sep 28, 2001 at 22:38 UTC
    Having been handed several different medium-sized systems and been told "Add these features, but keep it running", I've realized that substantial rewrites can occur quite easily.

    I've also flat-out said "I need to rewrite this", and it's been ok, too.

    The trick is that you have to be patient. Rome wasn't built in a day. You cannot do a new system from scratch and be able to keep the old schedules. Throw those out with the old system. Then, make new ones. Make them conservative! Be like Scotty - multiply all estimates by 4, then add 12. Then, double them just to be safe. You actually have a very good chance of getting away with this - if management complains, just quietly point to the old system. They should shut up really quick.

    If you lose some customers, so be it. You would've lost them anyways if you released a bad product.

    This also is a good chance to release incrementally. Get the top 10 customers involved in the design. They're using it, not you. Let them gain some ownership over the product. That way, they don't have much reason to complain over the design.

    I guess the big things are patience and confidence. Build small things first. Remember that every system is a series of little things working together.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.