Ovid has asked for the wisdom of the Perl Monks concerning the following question:
I have been tasked with overseeing the rewrite a very large Web-based application. This application was written before I started working here and I’m not terribly familiar with it. I’ve reviewed some of the code and it is terrible. Some issues:
- Extensive use of globals shared across multiple programs.
- Often fails to check status of system calls.
- strict is not consistently used.
- No documentation.
- Extensive HERE documents.
- A in-house routine to parse-form data was used, but was converted to CGI.pm, yet retained the old interface which often discarded data.
- Many duplicate functions, but often different interfaces to them.
- Many code features have side effects (such as changing those #$@%! gloval vars!).
This is a far more serious task than I have ever undertaken before and I could use all the advice I can get. I have been informed of this just a few minutes ago and here are my rough thoughts on approaching this:
- Create an inventory of all components necessary for this to run.
- Document the inputs, outputs, and purpose of all programs.
- Identify a set of standards for the rewrite. In particular, ensure that functions have standard interfaces (I'm sick of some returning a HoH and others returning a reference to an HoH).
- Identify what logic should be handled by the database and what should be in the Perl code.
- Strip out all HERE docs and start using Template Toolkit.
That's where I get stuck. Should I start working on the modules first? I think that's the best approach. However, do I rewrite them so that they set the globals and return the values, so that old code doesn't break? Simply stripping out globals will break every program. Later, when the conversions done, strip out the 'globals' code? I don't like that as I'm leery that the "stripping" out won't be done.
Start porting this application to a test site and build it a piece at a time? That's my preference, but it does mean that we won't have the more robust features available during development.
This is -- for me -- a very large rewrite. It's only about 30 or so programs, as far as I can tell, but many of them are thousands of lines long (though that's due in part to the HERE docs. Am I looking at this the wrong way? Is there a better way I can organize and direct things to get this done? Further, we're not being paid for the rewrite, so this will be done part-time in addition to our other work.
Cheers,
Ovid
Vote for paco!
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.
Re: Rewriting a large code base
by BMaximus (Chaplain) on Jun 28, 2001 at 04:33 UTC
|
This reminds me of my last job. The original code was a huge big lump of code. Most of it was in one module and it was a MESS. So my coworkers and I were given the task to turn the mess into what was to be known as 2nd gen. It too had extensive HERE docs. So here's what we did.
We had made a decision to extract all HERE docs and template the entire site. It was in our belief that HTML does not belong in Perl code. It only serves to bloat it. We then documented as to where everything went as far as the layout of the application. Inputs, outputs, database access, etc etc. It was a complete rewrite while basicaly salvaging code where we could.
We split the site up in to different elements.
- Application
- Business Logic
- Database interface
- Utilities
- Initialization
- Miscellaneous
Each of these were organized in to a separate directory and cvs was used to coordinate each of our tasks and contributions.
Application: was where the main applications were put it took care of presenting the information to the user.
Business Logic: took care of any calculations that wre needed and also subroutines for things that were commonly accessed were here as well so as to keep things from becoming redundant.
Database Interface: self explanatory. It held all the routines that put and retrieved data from the database.
Utilities: held modules with subroutines for calculating dates or implementing encryption.
Initialization: held startup.pl for mod_perl
Miscellaneous: Things we needed but couldn't really put in to a catagory.
Doing this and of course commenting and commenting some more made it very easy to maintain and change.
We created standards as to what subroutines would output. For instance any subroutine that needed to output a hash or an array would always give a ref of that type.
For the most part the theme was break it down and simplify. Which seemed to work really well for us. So breaking things up in to separate elements may work for you and it seems like your headed in that direction.
BMaximus
Update: I hope if your putting in "overtime" and staying in late to do this for your company that once its finished, they do something for you to show their appreciation for you and your coworkers. One of the best things about where I worked was they they did show us how much we meant to the company. | [reply] [Watch: Dir/Any] |
Re: Rewriting a large code base
by lemming (Priest) on Jun 28, 2001 at 04:08 UTC
|
Have you thought of writing a spec on what this set
of programs is supposed to do? It may be faster to
start over than to figure out what was done before and
then write in incrementals. Plus you may go insane if
you start contemplating the why.
Most of what you're talking about is really doing
a spec of the old program and then improving it. With
a new perspective, you may have a better idea of how
long it will take to write it.
| [reply] [Watch: Dir/Any] |
|
Hear, hear! That's a very important part of it. But this is what I would do:
1. Understand the purpose of the entire system
2. Understand the function of each component
3. Understand the interfaces between components
4. Choose what interfaces you want to use. Make them consistent. Also look at components that can be merged. Be sure to plan what the whole thing will look like when you're done, rather than haphazardly hacking away at it.
5. Change the neccessary components to impement these interfaces one at a time, hopefully without breaking anything in between.
One key that's very easy to forget is to know the architecture of the system very well. Find the orignal author if you can, or read design documents if there are any. Just know what you're changing before you start accidentally breaking things that you don't understand. (I speak from experience)
See you, space cowboy
| [reply] [Watch: Dir/Any] |
Re: Rewriting a large code base
by cLive ;-) (Prior) on Jun 28, 2001 at 07:53 UTC
|
Funny,
I'm just gonna do the same thing - well sort of. I'm starting to rewrite c. 12,000 lines of code that I originally started coding 3 years ago.
And I have similar issues, except I wrote the original and should have some idea what most of it did (thank god I learnt to comment before I learnt to code!)
From experience, I know that stripping out as many global vars as possible will make life easier when work needs doing later.
I'm also beginning to do the following coz my memory's so bad:
- comment fully each sub - list what input and output are expected.
- place all global vars in their own namespace (package 'global'), and document each one in the package
- write install script for software that checks module dependencies *before* installation (yes, I know...) - software is used on various servers, few of which we control.
- avoid export - over a large codebase, I find namespace clashes can confuse more than help save time (I'm now using OO instead
I'm also finding that amending __DIE__ to log errors in a file is particularly useful (in the case of cgi scripts, anyway - especially when "Carp fatalsToBrowser" is ambiguous.
I think if you can identify the problem global vars and isolate them one at a time, you can gradually amend. Of course, if the code is really bad, it might just be quicker to replan the whole system and rewrite everything from scratch.
Yes, port to a test/transitional site - there's no reason you can't gradually port, just make sure all new scripts use only the new modules. From bitter experience, I've found it's better to create robust modules/subs and then start to use them, rather than trying to gradually port a bunch of scripts.
HERE docs - depends on the context. If they are simple "My name is #name_here#", then strip if you can. Templates aren't always the solution though...
Good luck - If I find anything useful on my quest over the next few months, I'll let you know.
.02
cLive ;-) | [reply] [Watch: Dir/Any] |
Re: Rewriting a large code base
by Henri Icarus (Beadle) on Jun 28, 2001 at 05:55 UTC
|
Ovid,
Nice question for us. This is fun to chew on. I'm in a similar situation except that the big application that needs re-writing is my own! The other folks have given most of the important advice that needs to be given, I'd just add two things:
1) It's a mistake to think you can be rigorous and implement a system top down, or bottom up. My experience is that you've got to go both directions at the same time to get good code.
2) Don't try migrate the current code into the new solution. No matter how good you are, you're bound to break it at least once as you go along, and I'll bet you don't end up with anything nearly as nice when your done. Also, if you develop on a test system, you'll find that some of the "new" features that you add into the new system are completely independant of the current code base, so you can "paste" them in to the live site and get that functionality anyway.
-I went outside... and then I came back in!!!! | [reply] [Watch: Dir/Any] |
Re: Rewriting a large code base
by Zapawork (Beadle) on Jun 28, 2001 at 06:32 UTC
|
Hey Ovid,
I am also re-writing someone elses code. In my case its an open source project that has been rewritten and evolved inr 3 generations of code, without ever using strict or documenting. Very very nasty now.
I would tell you that in my opinion
1) use strict.. This may seem like alot of code rewriting at first... but its worth it. You will then know all the actual variables defined within the main loop, while also identifying which values are imported from modules. If this has been done already ... hoorah for good intentions on there half.
2)Begin to trace the actual execution of your program. What sub gets called first.. what does it call.. what modules it is dependent on. This map is essential for a complete understanding of what the old program does and how modifications will affect it, as well as the best place to put your modifications.
3) Once you have this you can begin to create a standard way to pass data between the modules and work from the main routine outwards to the subroutines as they are called. This way when you break it, and you will, you will know at what logical point in the execution you are in.
You should defiently do this in a test environment so that current users will not bitch too much. In addition to this
you should take care of the documenation you described earlier. My advice only relates to a rewrite of the code, not starting from scratch.. so I hope it gave you some value. It's working for me so far.
Dave | [reply] [Watch: Dir/Any] |
Re: Rewriting a large code base
by jbert (Priest) on Jun 28, 2001 at 12:22 UTC
|
Whichever direction you go in (full re-write or cleanup), you might want to build some test cases first. You can go for both system tests (prod your application with HTTP requests and compare the output against saved runs, for example) and unit tests (call into some of your components directly from test scripts).
Of course, its only worth putting unit tests in for interfaces which you want to keep. This is a good thing, because it forces you to think through the interfaces you want.
Once you have your test suite (I am assuming you don't want to significantly change the app behaviour). You can go ahead and clean up an item at a time - for example hunt down a particular global var.
Hmmm...probably worth getting all the code to run 'strict' and '-w' as a first step. If you are going to be removing and renaming variables the last thing you want to be doing is referring to the old variables in places and not knowing about it.
Yes the above is shamelessly stolen from 'Refactoring' - but it is a good idea nonetheless.
I'm not sure I'd have the discipline to do this in your situation, but hey - you asked for advise :-)
| [reply] [Watch: Dir/Any] |
Re: Rewriting a large code base
by mattr (Curate) on Jun 28, 2001 at 17:06 UTC
|
If I can add anything to the above, I would say
that it will help if your initial effort is spent
on laying the foundation to start out with a certain
degree of relaxation and then to actively maintain
sanity along the way. Then you will be able to
cheerfully go about your tasks.
I can tell you that if you just let this thing loom
in a way that can pull a dozen gotchas on you along the
way, you will not have fun.
Perhaps you would start by making a list of subroutines
and globals to start with and try to understand general
program flow, what the main operational modes or usage
scenarios are. This will help you think about architecture
and perhaps simplify the interface.
I would recommend spending time doing profiling -
what sections are really ugly or have the most lines,
and therefore will take the longest - and set yourself
up for a string of successes every few days.
You can practically
schedule them.. and this data actually will help
you understand how long it's going to take your team
to finish the job by doing resource allocation - how
much time is each person going to able to spend on it.
A skeleton program
with placeholder subs - which you would get after fleshing
out your new architecture from scratch - will then work
from the beginning and then just work better and better
as time goes on. Maybe the first iteration just shows
the screens you would see so that your client can try it
out. You may get some more requests for changes then,
which is okay as long as you can manage the load (treat it
like a real project with a deadline and manage risk) so
it is not a never ending story.
Another thing I could say is to document each routine
with input and output parameters, and how it is expected to
be used. I do this and try to include a "Usage: $ret = &myfunc($x)"
line inside the sub's comments so it becomes like a little
black box. Maybe you have a better way to do API
documentation, but I like to keep it to 3-5 lines at the
top of a function so I get a brushup whenever I go look at it.
You can do this before the heavy coding. Maybe you'll
also take the time to write some pseudocode in comments
in those subs. This
might allow more people to help you by working in parallel,
and it also reduces the amount of work anybody has to do on
a given task too. Keeping tasks small and well documented is
important if you are going to be putting little bits of time
into it over it and not just churn on it for a few days straight.
If possible I'd really recommend getting 2-3 sequential days to
hit a given subroutine and finish it so you can save the time
it would take to get the project back into your memory.
Also I would recommend considering this to be a new application,
and even if abbreviated, to carry out all the steps you would
normally do if it was a brand new system, including
writing a short "What this is supposed to do" report, then
a spec, and then sample interface mockups (coding unnecessary),
and then for each of these steps get your "client" to
agree with you, or go through iterations of each until
you get agreement, on each step. This way, you can be
sure that your client is supportive and will be happy because
he knows what he's getting. And you have a chance to
reconsider everything.. are you sure you want it to look like
*that*? and so on. Get all the bad vibes out before coding,
then you get to spend your time on controlling feature creep
and thinking up great code instead of always battling with
issues that "aren't your fault".
Maybe you do two iterations
on top of that product development dialogue, in which the "client" is first a programmer
on your team; talking it over with someone else helps you
keep perspective too. I suppose this leads into what XP
says, where you have two programmers to a terminal. I think
I'd prefer two programmers in close proximity working on
different sections, since they can then help each other,
get pizza, read PerlMonks, etc.
Have fun! | [reply] [Watch: Dir/Any] |
Re: Rewriting a large code base
by Malkavian (Friar) on Jun 28, 2001 at 13:49 UTC
|
Ouch, Ovid..
Sounds exactly like what I'm doing here (re-writing the core system of two old, messy and (to this company) rather vital sysems).
The code I have to deal with is a mish mash of shell (either csh, bash or ksh, with no real standard), perl and C. And it evolved over about 8 years of use, from the very start of the company, to a system processing gigs of data a day.
If things are truly in the disarray you mention (like it was here), and nobody knows the 'big picture', or at least, cannot document it properly, then don't expect to build this just once.
Because there were too many caveats to write down exactly what the code did from reading various sections of code, I wrote a prototype 'new' system for various areas. This gave me the familiarity with the way things operated, to the level that I could actually get the system working the way it should.
Then, after having this prototype ready, and documented with notes, it was far easier to do a production level set.
It's time consuming, and a tad annoying, but, once you know what you're really writing, it's easier to build the correct modules for what's actually required, and from there, more efficient to build the production system.
This may not be very applicable to you where you are, but, it worked for me. :)
Cheers,
Malk.
| [reply] [Watch: Dir/Any] |
Re: Rewriting a large code base
by frag (Hermit) on Jun 28, 2001 at 20:32 UTC
|
In a previous job I was on a team involved doing just this, only the original code was crufty but not really awful like yours. To improve things, a programmer on the team drew up a plan for a complete object-oriented redesign. This involved these steps:
- Breaking up the system into objects/classes
- Cataloging all of the scripts that would have to be
re-written using the new design
- Formulating a plan/project schedule for completion of
all of these modules and scripts, in a logical order based on what modules use what other modules
- Writing the API for each class, i.e. some basic POD for all the API calls, just listing what arguments are required/expected and what would be returned
- Writing tests for each API call
- Beginning to fill in the empty API methods with real code, and running tests as you go
- Constantly revisiting the OO API, tweaking it when it was realized that we forget something or that something was unnecessary or belonged to a different (or even new) class, and changing the tests to match the API changes
- When a module/script passed all tests, fully documenting the POD
This is, of course, very oversimplified, especially the order. The steps weren't done as discretely as this list sounds; the project schedule wasn't really done until after the basic API POD was written, and kept being adjusted; etc. We used CVS. The API POD was passed through a pod2html variant, so there was an easy reference to it all with a browser.
-- Frag. | [reply] [Watch: Dir/Any] |
Re: Rewriting a large code base
by dragonchild (Archbishop) on Jun 28, 2001 at 18:05 UTC
|
Break it up into smaller pieces. Modularize it as much as possible. This will include breaking out the global variables into their own namespace.
At least in the beginning, this is a perfect use for Exporter. Do a find . -name RCS -prune -o -name '*.pl' -print | xargs grep GlobalVar1, then replace every instace of those with a function that does a get (or set) on the appropriate variable. One way to structure your global classes is as such:
use strict;
use 5.6.0;
package Global::Var_Type_1;
use Exporter;
our @ISA = qw(Exporter);
our @EXPORT_OK = qw(get_var_1);
my $var1 = 0;
sub get_var_1 { return $var1; }
1;
This may sound stupid, but that means you now control every single one of these variables. And, more importantly, the biggest obstacle to getting strict turned on is removed. And, getting strict turned on is, in my opinion, the most important coding action you can do for a production system.
After that, everything everyone up above has said is perfect advice. It's all stuff I did when I was in your shoes a year ago. :) | [reply] [Watch: Dir/Any] [d/l] |
Re: Rewriting a large code base
by jplindstrom (Monsignor) on Jun 28, 2001 at 18:22 UTC
|
Lots of good advice here. I think I would approach your problem something like this:
-
You say the system contains a lot of scripts/programs. Document the behaviour and create test cases at this highest level. Feed it data in various ways to get an idea of what the programs do. It will probably help you getting a good system understanding.
-
Make sure it runs using strict and warnings. It will probably give you quite a few errors an bugs to fix. Fix them, hopefully without breaking anything. If you do, the tests will (maybe) tell you, but don't count on it, because you still don't "know" the system and have probably not created a complete test set. But when you're done with this, you will have a lot better code quality.
-
You're gonna change a _lot_ of code, so version control is probably very useful.
-
Get rid of the HERE docs to make it easier to read and refactor code. Boring but uncomplicated.
-
Get rid of the globals. Labour intensive.
-
_Now_ you're ready to start refactoring for real :)
/J
| [reply] [Watch: Dir/Any] |
Re: Rewriting a large code base
by one4k4 (Hermit) on Jun 28, 2001 at 16:45 UTC
|
The first thing I would do, besides document and burn everything, would be to format the code. Go through the program line by line, tabbing here, spacing there, and follow its flow. This way, you can guage what needs to be moved where, and what needs to be stripped away.
Turn on strict/-w and see what breaks. Course, do it in a test instance. Just my $.02. Sometimes I'm given projects like this as well, only its usually code I wrote months ago before I had this huge learning spurt.
You could always start over, with a fresh code base, and cut-n-paste things in as you need them. Rewritten as you see fit?
_14k4 - perlmonks@poorheart.com (www.poorheart.com) | [reply] [Watch: Dir/Any] |
Re: Rewriting a large code base
by toadi (Chaplain) on Jun 28, 2001 at 15:16 UTC
|
HAHA,
Lucky git. You can rewrite the application :) I just have to write new specs on a application like you are telling.
So I have to life with such a application and don't get permission to rewrite the *pain in the but*....
-- My opinions may have changed,
but not the fact that I am right
| [reply] [Watch: Dir/Any] |
|
Count me in on that Toadi. I'm a sub-sub-sub contractor on
my current project and the code is nasty with
globals and functions defined in required files that are
used by each application, but there seems to be no rhyme or
reason for this approach.
There is also a ton of duplicated code because it's a
website with an adminsite and a public site and no code is shared
between them. Eww.
Plus no strict, warnings, or taint. And I can't rewrite or
fix it because I'm here to add new features ASAP because
I'm "expensive".
It will never get rewritten or refactored because it
"works fine" for the end-client.
Thankfully I'm not supposed to be here that long but I'm
writing the worst Perl code of my life
(and Perl is my first language!) in this job.
Enough venting for today, back to the grind!
Clayton aka "Tex"
| [reply] [Watch: Dir/Any] [d/l] |
|
Actually my code is ok that I write. Cos I can't stand for it to deliver bad code.
What you do is the broken window stuff from Pragmatic Programmer. And I try to avoid the temptation :)
-- My opinions may have changed,
but not the fact that I am right
| [reply] [Watch: Dir/Any] |
Re: Rewriting a large code base
by Ovid (Cardinal) on Jul 18, 2004 at 03:00 UTC
|
And my quick follow-up three years later: with a task like this today, I would just shrug. Tests, refactoring, slow-n-steady conversion. I'm astonished at how trivial a task this now seems. Interesting what three years of perspective will do.
| [reply] [Watch: Dir/Any] |
Re: Rewriting a large code base
by Anonymous Monk on Jun 28, 2001 at 15:04 UTC
|
1) Define the functions (business) of the programs
2) Code it your way,
Start from there. :) | [reply] [Watch: Dir/Any] |
|
|