Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: Nobody Expects the Agile Imposition (Part VI): Architecture

by mr_mischief (Prior)
on Feb 01, 2011 at 00:07 UTC ( #885380=note: print w/ replies, xml ) Need Help??

in reply to Nobody Expects the Agile Imposition (Part VI): Architecture

Your thoughts about rewrites seem unorthodox to me. Let me clarify what I think of new projects, rewrites, and refactoring.

Subversion, git, and Mercurial are not rewrites of CVS. They are new projects with similar goals. If they had the exact same feature sets they'd be called clones. There's no rewriting at all. There's just a fresh writing.

A total rewrite is when you start writing the same project over from scratch. You throw your existing code base in the bin and plan to eventually ship a new version that started from different empty files. It probably won't pass the same external tests, and unit tests likely won't resemble the old ones. It likely uses an improved framework or a completely different one based on different concepts.

A partial rewrite is when you rewrite some portion -- a module, a source file, a few functions -- over from scratch. Most of the external tests will still work so long as you don't change too many features at the same time. Unit tests for the rewritten portions will likely need to change unless you carefully stick to the same API and internal interfaces as before.

Refactoring is when you clean up existing code and don't remove any code until you've got the replacement ready so it passes the same unit tests. You don't violate separation of concerns at all while refactoring. You just clean up what's there between change orders. The APIs between modules don't change. The internal interfaces stay the same except among very closely related functions or methods, and you end up with basically the same program. All external tests of the program pass without change. Most unit tests don't change, and the very few that do are just minor tweaks. The implementation is just clearer and maybe the execution path is shorter for the most common cases. Bugs probably don't even get fixed, although they are likely to be easier to notice by reasoning about the code. You're just cleaning the code, and you can generate a new ticket for the newly found bugs.

A change order is executed from any feature requests or bug tickets. This is when functionality changes without a rewrite. Let's talk about bugs first. Generally just enough lines are changed to fix behavior for a bug, and the code around it is only cleaned up at this point if necessary to make the bug fix manageable. The test changes for the bug are to test the fixed behavior and to test for the buggy behavior as well to see if it returns. This often means boundary checking or a little fuzzing.

The feature request might be to add, change, or remove a feature. The amount of code change can vary. The only tests that should need to change are those relating to the feature itself in the external tests. The unit tests should change for any new or removed APIs and internal interfaces adjusted for the feature.

What I like to do with a project is to take all the bug-fix change orders and implement them. Then I validate against my tests. Then I refactor the whole program. Then I take the feature requests and apply those. Then I refactor the whole program again. Then, if necessary, I optimize. Then, if I can refactor the optimized code without killing the performance, I refactor again. Then the process starts over with new change orders. Does it always happen this way? Of course not. I'd like that, though.

If I took the project over from another team, I'd try to refactor it all up front before making any changes in functionality. Then I'd start with the above process.

This seems quite a bit different from the terminology you're using. I understand not throwing away an important code base. Saying that's what someone writing a new alternative to an unrelated project is doing doesn't seem quite accurate to me, though. Git and subversion are based on different ideas for accomplishing different but similar tasks compared to CVS for example. People wanting to rewrite CVS would be trying to end up with something that is CVS but with none of the original code. The other change tracking systems were written with something better than CVS in mind and didn't have any code already bugfixed and tested for their something better.

Comment on Re: Nobody Expects the Agile Imposition (Part VI): Architecture
Re^2: Nobody Expects the Agile Imposition (Part VI): Architecture
by eyepopslikeamosquito (Canon) on Feb 01, 2011 at 07:40 UTC

    Thank you for a well thought out response. While the newer word "refactoring" seems to be pretty well-defined, I feel that the older word "rewriting" is not. From Martin Fowler's original Refactoring book:

    Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure. ... In essence when you refactor you are improving the design of the code after it has been written.
    Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior. Its heart is a series of small behavior preserving transformations. Each transformation (called a 'refactoring') does little, but a sequence of transformations can produce a significant restructuring. Since each refactoring is small, it's less likely to go wrong. The system is also kept fully working after each small refactoring, reducing the chances that a system can get seriously broken during the restructuring.
    Hopefully, most folks will agree with those definitions. Now it gets much harder. For example, your opinion:
    Subversion, git, and Mercurial are not rewrites of CVS.
    does not agree with mine. My personal view is that Subversion was a "rewrite" of CVS, while the other two were not. I don't feel strongly though. I may well be "unorthodox", as you claim, yet I was pleasantly surprised to discover that many others, including Joel Spolsky, share my opinion. From Joel Spolsky:
    You may also want to look into Subversion, a ground-up rewrite of CVS with many advantages.
    From Open Source Software Development (wikipedia):
    A good example of a complete rewrite was the Subversion version control system, whose developers started from scratch: they believed the codebase of CVS (an older attempt at creating a version control system), was useless and needed to be completely scrapped.
    From Concurrent Versions System (
    SubVersion is a project to rewrite CVS from scratch, in a more flexible and extendible way - and then to extend it.
    Finally, a probing (and relevant to this thread) question from Shlomi Fish interviews Ben Collins-Sussman:
    Subversion was a re-write from the grounds up done by many of the original CVS workers. Do you think it could have been faster to replace CVS (or CVSNT) component by component, thus yielding Subversion?

    To take another example, while I view Perl 6 as a "rewrite" of Perl 5, I suspect many monks would disagree with that view; a couple of them have already made that plain in this thread. Note however that Larry Wall at least seems to view Perl 6 as a "rewrite" of Perl:

    Perl 5 was my rewrite of Perl. I want Perl 6 to be the community's rewrite of Perl and of the community.
    Admittedly, that quote was taken from State of the Onion, TPC4, and the direction of Perl 6 has changed a bit since then. I'd be interested to know if Larry still views Perl 6 as a "rewrite" of Perl 5.

    Open Source Software Development (wikipedia) neatly summarizes the available rewrite/refactor options:

    Often open source developers feel that their code requires a revamp. This can be either because the code was written or maintained without proper refactoring (as is often the case if the code was inherited from a previous developer), or because a proposed enhancement or extension of it cannot be cleanly implemented with the existing codebase. A final reason for wishing to revamp the code is that the code "smells bad" (to quote Martin Fowler's Refactoring book) and does not meet the developer's standards. There are several kinds of revamps:
    1. Refactoring implies that the code is moved from one place to another, methods, functions or classes are extracted, duplicate code is eliminated and so forth - all while maintaining an integrity of the code. Such refactoring can be done in small amounts (so-called "continuous refactoring") to justify a certain change, or one can decide on large amounts of refactoring to an existing code that last for several days or weeks.
    2. "Partial rewrites" involve rewriting a certain part of the code from scratch, while keeping the rest of the code. Such partial rewrites have been common in the Linux kernel development, where several subsystems were rewritten or re-implemented from scratch, while keeping the rest of the code intact.
    3. Complete rewrites involve starting the project from scratch, while possibly still making use of some old code. A good example of a complete rewrite was the Subversion version control system, whose developers started from scratch: they believed the codebase of CVS (an older attempt at creating a version control system), was useless and needed to be completely scrapped. Another good example of such a rewrite was the Apache web server, which was almost completely re-written between version 1.3.x and version 2.0.x.

    Apart from arguing over semantics, the interesting strategic decision we face is whether to extend an existing legacy code base or throw it away and start from scratch. There is no one "right" answer to that question: it depends on the project, the team, the quality of the existing code base, and many other factors. Perhaps the most important thing is striving to prevent legacy code degenerating into a tangled mess in the first place.

      To take another example, while I view Perl 6 as a "rewrite" of Perl 5, I suspect many monks would disagree with that view; a couple of them have already made that plain in this thread. Note however that Larry Wall at least seems to view Perl 6 as a "rewrite" of Perl:

      Perl 5 was my rewrite of Perl. I want Perl 6 to be the community's rewrite of Perl and of the community.

      Sorry to be pedantic--it's not usually my thing--but I think you subtly reinterpreting Mr Wall's words in support of your argument.

      The man himself will set me straight if it is of interest to him, but I think that "Perl 6 to be the ... rewrite of Perl" is considerably different from "Perl 6 as a "rewrite" of Perl 5".

      'Perl', unadorned by the version number, is neither an implementation that can be re-written, nor a design evolution that can be reimplemented. It is a 'only'--and precisely completely--a concept; an ethos; an idea.

      As such, Perl 5 wasn't a rewrite of the Perl 4 implementation; but rather a rewrite of the Perl design that was then implemented as Perl 5. Ditto for Perl 6 relative to Perl 5.

      The (one; but a good one) definition of 'rewrite' in the context of software is:

      A rewrite in computer programming is the act or result of re-implementing a large portion of existing functionality without re-use of its source code. When the rewrite is not using existing code at all, it is common to speak of a rewrite from scratch. ..

      On the basis of both that definition, and my limited expereince of both, calling the feature rich Subversion a rewrite of the CVS, is like calling the Ford Focus a rewrite of the Ford Model-T. They serve a similar niche and target audience; but the way they go about achieving it is so utterly different.

      The goal of re-implementing the same basic functionality is present; but the provision of so much additional functionality makes the term 'rewrite' an inadequate description of the reality.

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      My intent was not to start an argument over semantics nor over anything else. I merely intended to clarify where I think some imprecision and unnecessary disagreement has entered the thread. If we keep using words we define differently as a basis, then we at least need to know how those words are being used by each party. Otherwise we'll talk past one another and nobody really knows where we would agree and disagree no matter how civil or friendly the discussion.

      I also think it helps to remember that intentions toward a project can change over time. What one thinks will be a straightforward rewrite from the beginning can change in focus and gain features before the rewrite is done (or even really started). The new design can be a totally different sort of beast from the old, but since it's still in the same lineage the distinction is blurred. In fact, I suspect that the svn folks intended to rewrite CVS but looking back would only loosely use that term for what they finally did. I think Larry would say Perl 5 is a rewrite of Perl 4 from the point of view of both the language and the perl tool. I would probably say that, anyway. I think he intended originally for Perl 6 to be a rewrite of some sort, but the language is the only thing being rewritten IMO. I think Rakudo and Parrot are definitely not rewrites of perl 5.6 or 5.8 although the language implemented is still in the Perl family. How Larry actually does view things of course would be for Larry to say no matter what I think he might say.

        What one thinks will be a straightforward rewrite from the beginning can change in focus and gain features before the rewrite is done.
        Yes, I suspect this happens rather a lot. I once inherited an "unmaintainable" build system written as a huge DOS .BAT script. Well, it was unmaintainable to me, because I didn't know .BAT very well and, frankly, didn't want to. My strong opinion, expressed in Unix shell versus Perl, is that you should not write non-trivial systems in .BAT (or Unix shell). Luckily, it was only a few thousand lines long and I was able to "rewrite" it in Perl fairly painlessly. Because the original design and interface was so bad, I "improved" it as I went, so my "rewrite" ended up a fair bit different to the original. Naturally, I didn't write it as a monolithic script, but as a number of modules along with a small script mainline. Now you and BrowserUk may claim I didn't rewrite it, I wrote a new build system. Fair enough, but I prefer not to argue about that any more. :) The important strategic question is: should I have "rewritten" it in Perl or improved it by changing the existing .BAT script? From my (biased) point of view, the "rewrite" was a raging success because we were able to extend this system many times over the years (often by adding new modules) and I feel the cost of the rewrite was got back many times over by improved robustness and performance, along with much easier maintenance over a period of many years.

        Of course, rewriting small systems is easy. Suppose this system has now grown to 100,000+ lines of Perl and the person who takes it over dislikes Perl, claims it is a tangled mess, and decides it would be more "maintainable" to rewrite it in Ruby or Python. Is that a wise decision? Though I would normally argue against that, others have been known to argue for it (BTW, as far as I'm aware, sanity prevailed and Bugzilla is still written in Perl).

        Whether it makes sense to rewrite depends on many factors: how large the system is; how ambitiously you want to extend it; how clean is its code; and how skillful are the rewriters. I make the last point because unfortunately I've seen a number of over-confident programmers over the years complain loudly about a crappy old system, boldy rewrite it ... then end up with a crappier system than the original! In my experience, the quality of the developers doing the rewrite is crucial: a creaky old legacy system written by first-rate developers is likely to be better than a shiny new one rewritten by mediocre developers. As Joel Spolsky warns when you rewrite: "there is absolutely no reason to believe that you are going to do a better job".

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://885380]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (10)
As of 2014-07-29 11:04 GMT
Find Nodes?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:

    Results (214 votes), past polls