http://www.perlmonks.org?node_id=569852

Hi All

I need the wealth of your wisdom and experience in software development, so this is not related to Perl per se. Hence 'OT' in the subject. The software that I've been working on for the past 2.5 years has become so hard to optimise and modified. We had a meeting last week and we decided to re-architect it.

The basic problem is that the software is written as one monolithic mod_perl application, which appears to be three different products to the client. To be able to go forward, we need to do deep surgery on it and split it to different independent sub-systems and layers. We also need to majorly change the database structure. Although we might be able to reuse the existing modules, I am sure that the new architecture will be completely different from now.

This brings me to question, "HOW are we gonna do it?" There are two options: First is to Rewrite, and second is to Refactor. I'll list the good and the bad of each approach from my point of view. Then I'd like to hear what y'all have experienced when facing problems like this.

Rewrite

My definition of Rewrite is to create the new architecture and then copy the relevant modules from the old code base with probably some modifications to fit the module with the new architecture. So this is not a complete rewrite, but this is not a refactoring either.

The good

  • We can start from a clean state, which doesn't carry baggages from the old architecture.

The bad

  • We may miss some functionalities from the old system while porting the modules.
  • The old code base could have been changed while we're working on the new code, which makes a porting nightmare.
  • The new code base isn't tested to handle the load of the production code, which mean we'll have performance problem on the first few months after launching the new code.

Refactor

My definition of Refactor is to change the code one thing at a time, whithout affecting functionality.

The good

  • Less likely to miss functionalities from the old code base.
  • The original code has already been tested on production.

The bad

  • We might not end up with the new ideal architecture, because refactoring works on the code level, not on architectural level.

I haven't made up my mind on either approach. And due to my limited experience, I can't get a more exhaustive list of the good and the bad of them. Can you help?

Thank you

UPDATE: Thanks for everyone who've provided their insights. More insights are welcome. Up to this minute the best insight is from GrandFather, which suggested to refactor by creating interfaces and test suites.

UPDATE: Strangler Application pointed by adrianh seems to be the technique that incorporate interfaces mentioned by GrandFather. Great suggestions guys!!! Keep it coming.

-cheepy-

Replies are listed 'Best First'.
Re: OT: Rewrite or Refactor?
by GrandFather (Saint) on Aug 27, 2006 at 05:14 UTC

    Many things affect the most appropriate path. A further one you haven't mentioned is timescale. Generally refactoring can be accomplished more quickly than rewriting.

    In your case it looks like a true rewrite is not possible in any case. Your only real decision is how deep to cut in the process of refactoring your existing code - you have to keep the system going.

    Whatever you do, one of the first things you need is to implement a regression test system to ensure you don't introduce unexpected behaviour during alterations.

    If you need to restructure your database, I'd be inclined to split the database back end off as soon as possible, wrap it up in a fairly abstract interface, then work on it as a seperate entity. Again, make sure you write test code to check things don't get broken along the way.

    Actually, I'd imagine the whole process will turn into: Identify an area of code, write a test suite for it, put it behind an interface, work on it in stages until happy. To the extent that it is possible, putting stuff behind an interface is a pretty light weight operation and isolates clients of the code from changes behind the interface. It also makes for easy and comprehensive testing.

    Note that there are two levels of testing involved here. High level testing of the integrated code - tests written against the code as it is at the moment, and tests written against focused sections of the code when those sections come under the spot light for refactoring.

    So, your first task is to write a big test suite for the application, then identify sections of code that can be put behind interfaces. Make minimal changes to push each section behind an interface. Write tests against the interfaces. Then you have the freedom to rework each section independently of anything else that may be going on, and seperately from the live version of the application (because you are rewriting against the test suite, not in the context of the application).


    DWIM is Perl's answer to Gödel

      Thanks for your reply.

      But how do you refactor the code architecture? I mean splitting one monolithic applications to become multiple applications that work together. Additionally, can I use the refactoring techniques to change my database structure from using multiple databases with the same schema to using one database?

      I admit that I haven't learned the refactoring techniques in depth. I just haven't heard of refactoring can work on such a high level as the software architecture.

      See this for my explanation about the architecture of my code.

      Thank you.

      -cheepy-

        In a sense refactoring is just moving chunks of code around - sometimes combining chunks, sometimes spliting them. No matter how you cut it, the process you have described is refactoring rather than rewriting. Also in a sense refactoring is all about architecture. When you refactor you are rearchitecting some part of your "design". However, labels don't help you achieve your goal.

        Database stuff I don't know much about. However if you take a representative snapshot of your databases that you can work with offline, then you can generate a test framework first against the databases and the code that services them. Then rework the code and databases into the new architecture (developing porting tools as needed along the way), remebering to check against the test framework. If you need to change the live system, make sure anything pertinent is checked by the common test framework and that the reworked system is updated to pass the new tests. Then when the reworked system is deemed ready and passes all tests you can quickly and confidently move the changes back into the live system. So long as both the live system and the reworked system behave as expected against the test framework at the time of the merge every thing should go smoothly.

        The two key elements are identifying and implementing interfaces and implementing test frameworks against those interfaces. Once the interfaces are in place you can mix and match client and server code (code on either side of the interface) as you like. To the client code it doesn't matter how the server code does its job and it doesn't matter if it is in the same process, a different process on the same cpu, or a process running on hardware across the other side of the Earth.

        Interfaces to provide isolation. Tests to make sure you don't break stuff. It really doesn't matter what you call it.


        DWIM is Perl's answer to Gödel
Re: OT: Rewrite or Refactor?
by dws (Chancellor) on Aug 27, 2006 at 05:32 UTC

    Less likely to miss functionalities from the old code base.

    Indeed. In any large code base that's grown up over a period long enough for the original developers to have moved on, there are going to be chunks of essential functionality that aren't well understood. When you rewrite (and miss the misunderstood functionality), you're forced into reactive rediscovery under fire when something that people depend on suddenly no longer works. Not a fun place to be.

    Joel Spolsky had a post a while back called Things You Should Never Do, wherein he claims that rewriting working software is a huge mistake. He lays out some good arguments.

    If you choose to go the refactoring route, Michael Feathers book Working Effectively with Legacy Code might provide you with some good tools. He tackles the problem of how you retrofit tests onto legacy code in preparation for adding new features (which often involves first making the code safe to refactor).

Re: OT: Rewrite or Refactor?
by bobf (Monsignor) on Aug 27, 2006 at 04:30 UTC

    Without knowing anything about the code in the one application/three products, it will be hard to give you specific advice.

    In past projects I've used a combination approach that attempts to balance the positives and negatives of refactoring and rewriting. In your case, I might suggest designing the architecture how you want it to be and then start to rewrite it, but take as many snippets as possible from the current code and refactor them as needed when you plug them into the new design (the latter might be aided by doing some minor refactoring of the existing code first - breaking out code into subroutines, adding blocks to tighten the scope of variables, etc). That would give you the new design that you need, but it would leverage some of the functionality and strengths of the existing code so you don't have to rewrite the whole thing from scratch (provided there are pieces that are good enough to salvage).

    Regardless of the approach you take, the Perl Medic will likely help. In addition, a few recent threads may be of interest, including Strategies for maintenance of horrible code?, Consideration for others code, and Perl in the Enterprise (the latter mentions Devel::Refactor and the PPI refactoring editor).

    Good luck.

    Update: ++ to GrandFather for the comments about adding a test suite. I thought of it, then forgot to include it (that's what I get for posting while tired). Read his node - he said it better than I would have, anyway. :-)

      Firstly, thanks for the reply.

      The structure of the code is roughly like three tier architecture with all the tiers are running in one apache process (compiled with mod_perl), so they are only logically separated. I have database layer, which abstract the SQL creations and database connections. Then there is the business logic layer, which implement the (surprise!) business logic with all modules being coded in object oriented way. And the top layer is the application layer, which basically is a web-application-framework-wannabe that I wrote myself due to unavailability of the framework that meets my needs.

      The new architecture that I am planning to create is splitting the three layers into different applications and make them communicate with XML-RPC or other communication protocol. And probably also split one or two sub system that warrants running on their own to become an independent application. We need to do this to allow us to scale to the next level.

      Another big thing in this new architecture is the new database structure. Currently we have hundreds of databases with the same schema that we keep up to date using a script. We want to move all of them to one database so we can use facilities provided by the database to improve performance and keeping data integrity.

      I have Perl Medic book, but I haven't read it yet. But I will. Looking forward to hearing from you.

      Thanks

      -cheepy-

        What do you expect to gain from the XML-RPC?!? Apart from the need of beefier hardware. Do you need to be able to run those three layers on separate servers (using several computers for the same layer even)? Maybe there is a good reason, I'm just curious.

Re: OT: Rewrite or Refactor?
by adrianh (Chancellor) on Aug 27, 2006 at 09:27 UTC

    My vote would be for refactor. Two reasons. You've missed on big disadvantage from the rewrite option. You won't have a working system for some time. If you're refactoring you'll always have a working system.

    We might not end up with the new ideal architecture, because refactoring works on the code level, not on architectural level.

    I think you will get your ideal architecture, or at least you can get there with some tiny rewrites on top of a lot of refactoring. That's been my experience anyway. The architecture is, after all, embodied in the code. In general I've found incremental change to be a far more effective technique that a rewrite from scratch.

    Take a look at Working Effectively with Legacy Code and the Strangler Application pattern for some ideas.

Re: OT: Rewrite or Refactor?
by nmerriweather (Friar) on Sep 01, 2006 at 04:50 UTC
    I think the largest problem you've probably had is that the application is in mod_perl

    I love mod_perl- i develop in it extensively - but if you have legacy mod_perl code, I bet thats probably the scariest f'ing thing in the world. the design patterns people used in mod_perl 5-10 years ago are just plain frightening

    now, i'm not sure if this would be a strangler method or not, but this is what i would do:

    i'd spend some time designing a whole new application. i'd make it 100% modular and extensible, easy to upgrade, easy to extend.

    i'd create an abstraction layer using an ORM like Rose::DB::Object ( i usally hate them, but in this case, i think it makes sense ) to map new objects onto the extisting database (and vice versa). with a little bit of smoke and mirrors, you can get your object classes talking to two very seperate database models, but with the same user interface.

    then, bit by bit, I'd migrate sections from the old system onto the new one- as they're built

    in essence, I'd rewrite your system , but do it in a manner that is more in-line with refactoring.

      Thanks for your suggestion. I take it that your suggestion is related to the database restructuring.

      i'd create an abstraction layer using an ORM like Rose::DB::Object ( i usally hate them, but in this case, i think it makes sense ) to map new objects onto the extisting database (and vice versa).

      I have a good enough database abstraction layer. Do you mean I need to create another abstraction layer? Or I only need to make the layer to deal with two database models? I am more inclined to database refactoring when thinking about database restructuring.

      I don't know about the scariest f'ing thing stuff. This is my first professional experience so I don't know any better. I just learn as I go, and perlmonks has been the one that helps me to get to this point. So this rewrite/refactoring is exciting, challenging, and scary at the same time.

      Thanks

      -cheepy-