Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Rewriting a large code base

by Ovid (Cardinal)
on Jun 28, 2001 at 03:48 UTC ( #92124=perlquestion: print w/ replies, xml ) Need Help??
Ovid has asked for the wisdom of the Perl Monks concerning the following question:

I have been tasked with overseeing the rewrite a very large Web-based application. This application was written before I started working here and Iím not terribly familiar with it. Iíve reviewed some of the code and it is terrible. Some issues:

  • Extensive use of globals shared across multiple programs.
  • Often fails to check status of system calls.
  • strict is not consistently used.
  • No documentation.
  • Extensive HERE documents.
  • A in-house routine to parse-form data was used, but was converted to CGI.pm, yet retained the old interface which often discarded data.
  • Many duplicate functions, but often different interfaces to them.
  • Many code features have side effects (such as changing those #$@%! gloval vars!).

This is a far more serious task than I have ever undertaken before and I could use all the advice I can get. I have been informed of this just a few minutes ago and here are my rough thoughts on approaching this:

  • Create an inventory of all components necessary for this to run.
  • Document the inputs, outputs, and purpose of all programs.
  • Identify a set of standards for the rewrite. In particular, ensure that functions have standard interfaces (I'm sick of some returning a HoH and others returning a reference to an HoH).
  • Identify what logic should be handled by the database and what should be in the Perl code.
  • Strip out all HERE docs and start using Template Toolkit.

That's where I get stuck. Should I start working on the modules first? I think that's the best approach. However, do I rewrite them so that they set the globals and return the values, so that old code doesn't break? Simply stripping out globals will break every program. Later, when the conversions done, strip out the 'globals' code? I don't like that as I'm leery that the "stripping" out won't be done.

Start porting this application to a test site and build it a piece at a time? That's my preference, but it does mean that we won't have the more robust features available during development.

This is -- for me -- a very large rewrite. It's only about 30 or so programs, as far as I can tell, but many of them are thousands of lines long (though that's due in part to the HERE docs. Am I looking at this the wrong way? Is there a better way I can organize and direct things to get this done? Further, we're not being paid for the rewrite, so this will be done part-time in addition to our other work.

Cheers,
Ovid

Vote for paco!

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Comment on Rewriting a large code base
Re: Rewriting a large code base
by lemming (Priest) on Jun 28, 2001 at 04:08 UTC

    Have you thought of writing a spec on what this set of programs is supposed to do? It may be faster to start over than to figure out what was done before and then write in incrementals. Plus you may go insane if you start contemplating the why.

    Most of what you're talking about is really doing a spec of the old program and then improving it. With a new perspective, you may have a better idea of how long it will take to write it.

      Hear, hear! That's a very important part of it. But this is what I would do:
      1. Understand the purpose of the entire system
      2. Understand the function of each component
      3. Understand the interfaces between components
      4. Choose what interfaces you want to use. Make them consistent. Also look at components that can be merged. Be sure to plan what the whole thing will look like when you're done, rather than haphazardly hacking away at it.
      5. Change the neccessary components to impement these interfaces one at a time, hopefully without breaking anything in between.

      One key that's very easy to forget is to know the architecture of the system very well. Find the orignal author if you can, or read design documents if there are any. Just know what you're changing before you start accidentally breaking things that you don't understand. (I speak from experience)
      See you, space cowboy
Re: Rewriting a large code base
by BMaximus (Chaplain) on Jun 28, 2001 at 04:33 UTC
    This reminds me of my last job. The original code was a huge big lump of code. Most of it was in one module and it was a MESS. So my coworkers and I were given the task to turn the mess into what was to be known as 2nd gen. It too had extensive HERE docs. So here's what we did.

    We had made a decision to extract all HERE docs and template the entire site. It was in our belief that HTML does not belong in Perl code. It only serves to bloat it. We then documented as to where everything went as far as the layout of the application. Inputs, outputs, database access, etc etc. It was a complete rewrite while basicaly salvaging code where we could.

    We split the site up in to different elements.

    • Application
    • Business Logic
    • Database interface
    • Utilities
    • Initialization
    • Miscellaneous


    Each of these were organized in to a separate directory and cvs was used to coordinate each of our tasks and contributions.

    Application: was where the main applications were put it took care of presenting the information to the user.

    Business Logic: took care of any calculations that wre needed and also subroutines for things that were commonly accessed were here as well so as to keep things from becoming redundant.

    Database Interface: self explanatory. It held all the routines that put and retrieved data from the database.

    Utilities: held modules with subroutines for calculating dates or implementing encryption.

    Initialization: held startup.pl for mod_perl

    Miscellaneous: Things we needed but couldn't really put in to a catagory.

    Doing this and of course commenting and commenting some more made it very easy to maintain and change.

    We created standards as to what subroutines would output. For instance any subroutine that needed to output a hash or an array would always give a ref of that type.

    For the most part the theme was break it down and simplify. Which seemed to work really well for us. So breaking things up in to separate elements may work for you and it seems like your headed in that direction.

    BMaximus

    Update: I hope if your putting in "overtime" and staying in late to do this for your company that once its finished, they do something for you to show their appreciation for you and your coworkers. One of the best things about where I worked was they they did show us how much we meant to the company.
Re: Rewriting a large code base
by Henri Icarus (Beadle) on Jun 28, 2001 at 05:55 UTC
    Ovid,

    Nice question for us. This is fun to chew on. I'm in a similar situation except that the big application that needs re-writing is my own! The other folks have given most of the important advice that needs to be given, I'd just add two things:

    1) It's a mistake to think you can be rigorous and implement a system top down, or bottom up. My experience is that you've got to go both directions at the same time to get good code.

    2) Don't try migrate the current code into the new solution. No matter how good you are, you're bound to break it at least once as you go along, and I'll bet you don't end up with anything nearly as nice when your done. Also, if you develop on a test system, you'll find that some of the "new" features that you add into the new system are completely independant of the current code base, so you can "paste" them in to the live site and get that functionality anyway.

    -I went outside... and then I came back in!!!!

Re: Rewriting a large code base
by Zapawork (Beadle) on Jun 28, 2001 at 06:32 UTC
    Hey Ovid,

    I am also re-writing someone elses code. In my case its an open source project that has been rewritten and evolved inr 3 generations of code, without ever using strict or documenting. Very very nasty now.

    I would tell you that in my opinion

    1) use strict.. This may seem like alot of code rewriting at first... but its worth it. You will then know all the actual variables defined within the main loop, while also identifying which values are imported from modules. If this has been done already ... hoorah for good intentions on there half.

    2)Begin to trace the actual execution of your program. What sub gets called first.. what does it call.. what modules it is dependent on. This map is essential for a complete understanding of what the old program does and how modifications will affect it, as well as the best place to put your modifications.

    3) Once you have this you can begin to create a standard way to pass data between the modules and work from the main routine outwards to the subroutines as they are called. This way when you break it, and you will, you will know at what logical point in the execution you are in.

    You should defiently do this in a test environment so that current users will not bitch too much. In addition to this you should take care of the documenation you described earlier. My advice only relates to a rewrite of the code, not starting from scratch.. so I hope it gave you some value. It's working for me so far.

    Dave

Re: Rewriting a large code base
by cLive ;-) (Parson) on Jun 28, 2001 at 07:53 UTC
    Funny,

    I'm just gonna do the same thing - well sort of. I'm starting to rewrite c. 12,000 lines of code that I originally started coding 3 years ago.

    And I have similar issues, except I wrote the original and should have some idea what most of it did (thank god I learnt to comment before I learnt to code!)

    From experience, I know that stripping out as many global vars as possible will make life easier when work needs doing later.

    I'm also beginning to do the following coz my memory's so bad:

    • comment fully each sub - list what input and output are expected.
    • place all global vars in their own namespace (package 'global'), and document each one in the package
    • write install script for software that checks module dependencies *before* installation (yes, I know...) - software is used on various servers, few of which we control.
    • avoid export - over a large codebase, I find namespace clashes can confuse more than help save time (I'm now using OO instead

    I'm also finding that amending __DIE__ to log errors in a file is particularly useful (in the case of cgi scripts, anyway - especially when "Carp fatalsToBrowser" is ambiguous.

    I think if you can identify the problem global vars and isolate them one at a time, you can gradually amend. Of course, if the code is really bad, it might just be quicker to replan the whole system and rewrite everything from scratch.

    Yes, port to a test/transitional site - there's no reason you can't gradually port, just make sure all new scripts use only the new modules. From bitter experience, I've found it's better to create robust modules/subs and then start to use them, rather than trying to gradually port a bunch of scripts.

    HERE docs - depends on the context. If they are simple "My name is #name_here#", then strip if you can. Templates aren't always the solution though...

    Good luck - If I find anything useful on my quest over the next few months, I'll let you know.

    .02

    cLive ;-)

Re: Rewriting a large code base
by jbert (Priest) on Jun 28, 2001 at 12:22 UTC

    Whichever direction you go in (full re-write or cleanup), you might want to build some test cases first. You can go for both system tests (prod your application with HTTP requests and compare the output against saved runs, for example) and unit tests (call into some of your components directly from test scripts).

    Of course, its only worth putting unit tests in for interfaces which you want to keep. This is a good thing, because it forces you to think through the interfaces you want.

    Once you have your test suite (I am assuming you don't want to significantly change the app behaviour). You can go ahead and clean up an item at a time - for example hunt down a particular global var.

    Hmmm...probably worth getting all the code to run 'strict' and '-w' as a first step. If you are going to be removing and renaming variables the last thing you want to be doing is referring to the old variables in places and not knowing about it.

    Yes the above is shamelessly stolen from 'Refactoring' - but it is a good idea nonetheless.

    I'm not sure I'd have the discipline to do this in your situation, but hey - you asked for advise :-)

Re: Rewriting a large code base
by Malkavian (Friar) on Jun 28, 2001 at 13:49 UTC
    Ouch, Ovid..
    Sounds exactly like what I'm doing here (re-writing the core system of two old, messy and (to this company) rather vital sysems).
    The code I have to deal with is a mish mash of shell (either csh, bash or ksh, with no real standard), perl and C. And it evolved over about 8 years of use, from the very start of the company, to a system processing gigs of data a day.
    If things are truly in the disarray you mention (like it was here), and nobody knows the 'big picture', or at least, cannot document it properly, then don't expect to build this just once.
    Because there were too many caveats to write down exactly what the code did from reading various sections of code, I wrote a prototype 'new' system for various areas. This gave me the familiarity with the way things operated, to the level that I could actually get the system working the way it should.
    Then, after having this prototype ready, and documented with notes, it was far easier to do a production level set.
    It's time consuming, and a tad annoying, but, once you know what you're really writing, it's easier to build the correct modules for what's actually required, and from there, more efficient to build the production system.
    This may not be very applicable to you where you are, but, it worked for me. :)

    Cheers,

    Malk.
Re: Rewriting a large code base
by Anonymous Monk on Jun 28, 2001 at 15:04 UTC
    1) Define the functions (business) of the programs 2) Code it your way, Start from there. :)
Re: Rewriting a large code base
by toadi (Chaplain) on Jun 28, 2001 at 15:16 UTC
    HAHA,

    Lucky git. You can rewrite the application :) I just have to write new specs on a application like you are telling.

    So I have to life with such a application and don't get permission to rewrite the *pain in the but*....

    --
    My opinions may have changed,
    but not the fact that I am right

      Count me in on that Toadi. I'm a sub-sub-sub contractor on my current project and the code is nasty with globals and functions defined in required files that are used by each application, but there seems to be no rhyme or reason for this approach.

      There is also a ton of duplicated code because it's a website with an adminsite and a public site and no code is shared between them. Eww.

      Plus no strict, warnings, or taint. And I can't rewrite or fix it because I'm here to add new features ASAP because I'm "expensive".

      It will never get rewritten or refactored because it "works fine" for the end-client.

      Thankfully I'm not supposed to be here that long but I'm writing the worst Perl code of my life (and Perl is my first language!) in this job.

      Enough venting for today, back to the grind!
      Clayton aka "Tex"

        Actually my code is ok that I write. Cos I can't stand for it to deliver bad code.

        What you do is the broken window stuff from Pragmatic Programmer. And I try to avoid the temptation :)



        --
        My opinions may have changed,
        but not the fact that I am right

Re: Rewriting a large code base
by one4k4 (Hermit) on Jun 28, 2001 at 16:45 UTC
    The first thing I would do, besides document and burn everything, would be to format the code. Go through the program line by line, tabbing here, spacing there, and follow its flow. This way, you can guage what needs to be moved where, and what needs to be stripped away.

    Turn on strict/-w and see what breaks. Course, do it in a test instance. Just my $.02. Sometimes I'm given projects like this as well, only its usually code I wrote months ago before I had this huge learning spurt.

    You could always start over, with a fresh code base, and cut-n-paste things in as you need them. Rewritten as you see fit?

    _14k4 - perlmonks@poorheart.com (www.poorheart.com)
Re: Rewriting a large code base
by mattr (Curate) on Jun 28, 2001 at 17:06 UTC
    If I can add anything to the above, I would say that it will help if your initial effort is spent on laying the foundation to start out with a certain degree of relaxation and then to actively maintain sanity along the way. Then you will be able to cheerfully go about your tasks.

    I can tell you that if you just let this thing loom in a way that can pull a dozen gotchas on you along the way, you will not have fun.

    Perhaps you would start by making a list of subroutines and globals to start with and try to understand general program flow, what the main operational modes or usage scenarios are. This will help you think about architecture and perhaps simplify the interface.

    I would recommend spending time doing profiling - what sections are really ugly or have the most lines, and therefore will take the longest - and set yourself up for a string of successes every few days. You can practically schedule them.. and this data actually will help you understand how long it's going to take your team to finish the job by doing resource allocation - how much time is each person going to able to spend on it. A skeleton program with placeholder subs - which you would get after fleshing out your new architecture from scratch - will then work from the beginning and then just work better and better as time goes on. Maybe the first iteration just shows the screens you would see so that your client can try it out. You may get some more requests for changes then, which is okay as long as you can manage the load (treat it like a real project with a deadline and manage risk) so it is not a never ending story.

    Another thing I could say is to document each routine with input and output parameters, and how it is expected to be used. I do this and try to include a "Usage: $ret = &myfunc($x)" line inside the sub's comments so it becomes like a little black box. Maybe you have a better way to do API documentation, but I like to keep it to 3-5 lines at the top of a function so I get a brushup whenever I go look at it. You can do this before the heavy coding. Maybe you'll also take the time to write some pseudocode in comments in those subs. This might allow more people to help you by working in parallel, and it also reduces the amount of work anybody has to do on a given task too. Keeping tasks small and well documented is important if you are going to be putting little bits of time into it over it and not just churn on it for a few days straight. If possible I'd really recommend getting 2-3 sequential days to hit a given subroutine and finish it so you can save the time it would take to get the project back into your memory.

    Also I would recommend considering this to be a new application, and even if abbreviated, to carry out all the steps you would normally do if it was a brand new system, including writing a short "What this is supposed to do" report, then a spec, and then sample interface mockups (coding unnecessary), and then for each of these steps get your "client" to agree with you, or go through iterations of each until you get agreement, on each step. This way, you can be sure that your client is supportive and will be happy because he knows what he's getting. And you have a chance to reconsider everything.. are you sure you want it to look like *that*? and so on. Get all the bad vibes out before coding, then you get to spend your time on controlling feature creep and thinking up great code instead of always battling with issues that "aren't your fault".

    Maybe you do two iterations on top of that product development dialogue, in which the "client" is first a programmer on your team; talking it over with someone else helps you keep perspective too. I suppose this leads into what XP says, where you have two programmers to a terminal. I think I'd prefer two programmers in close proximity working on different sections, since they can then help each other, get pizza, read PerlMonks, etc.

    Have fun!

Re: Rewriting a large code base
by dragonchild (Archbishop) on Jun 28, 2001 at 18:05 UTC
    Break it up into smaller pieces. Modularize it as much as possible. This will include breaking out the global variables into their own namespace.

    At least in the beginning, this is a perfect use for Exporter. Do a find . -name RCS -prune -o -name '*.pl' -print | xargs grep GlobalVar1, then replace every instace of those with a function that does a get (or set) on the appropriate variable. One way to structure your global classes is as such:

    use strict; use 5.6.0; package Global::Var_Type_1; use Exporter; our @ISA = qw(Exporter); our @EXPORT_OK = qw(get_var_1); my $var1 = 0; sub get_var_1 { return $var1; } 1;

    This may sound stupid, but that means you now control every single one of these variables. And, more importantly, the biggest obstacle to getting strict turned on is removed. And, getting strict turned on is, in my opinion, the most important coding action you can do for a production system.

    After that, everything everyone up above has said is perfect advice. It's all stuff I did when I was in your shoes a year ago. :)

Re: Rewriting a large code base
by jplindstrom (Monsignor) on Jun 28, 2001 at 18:22 UTC
    Lots of good advice here. I think I would approach your problem something like this:

    • You say the system contains a lot of scripts/programs. Document the behaviour and create test cases at this highest level. Feed it data in various ways to get an idea of what the programs do. It will probably help you getting a good system understanding.
    • Make sure it runs using strict and warnings. It will probably give you quite a few errors an bugs to fix. Fix them, hopefully without breaking anything. If you do, the tests will (maybe) tell you, but don't count on it, because you still don't "know" the system and have probably not created a complete test set. But when you're done with this, you will have a lot better code quality.
    • You're gonna change a _lot_ of code, so version control is probably very useful.
    • Get rid of the HERE docs to make it easier to read and refactor code. Boring but uncomplicated.
    • Get rid of the globals. Labour intensive.
    • _Now_ you're ready to start refactoring for real :)

    /J

Re: Rewriting a large code base
by frag (Hermit) on Jun 28, 2001 at 20:32 UTC
    In a previous job I was on a team involved doing just this, only the original code was crufty but not really awful like yours. To improve things, a programmer on the team drew up a plan for a complete object-oriented redesign. This involved these steps:
    1. Breaking up the system into objects/classes
    2. Cataloging all of the scripts that would have to be re-written using the new design
    3. Formulating a plan/project schedule for completion of all of these modules and scripts, in a logical order based on what modules use what other modules
    4. Writing the API for each class, i.e. some basic POD for all the API calls, just listing what arguments are required/expected and what would be returned
    5. Writing tests for each API call
    6. Beginning to fill in the empty API methods with real code, and running tests as you go
    7. Constantly revisiting the OO API, tweaking it when it was realized that we forget something or that something was unnecessary or belonged to a different (or even new) class, and changing the tests to match the API changes
    8. When a module/script passed all tests, fully documenting the POD

    This is, of course, very oversimplified, especially the order. The steps weren't done as discretely as this list sounds; the project schedule wasn't really done until after the basic API POD was written, and kept being adjusted; etc. We used CVS. The API POD was passed through a pod2html variant, so there was an easy reference to it all with a browser.

    -- Frag.

Re: Rewriting a large code base
by Ovid (Cardinal) on Jul 18, 2004 at 03:00 UTC

    And my quick follow-up three years later: with a task like this today, I would just shrug. Tests, refactoring, slow-n-steady conversion. I'm astonished at how trivial a task this now seems. Interesting what three years of perspective will do.

    Cheers,
    Ovid

    New address of my CGI Course.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://92124]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (11)
As of 2014-09-30 12:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (364 votes), past polls