Enlightened Monks,

First off, I apologize for the OT. This question is more on the topic of programming in general than Perl.

We've all seen difficult to read code...You know the kind, undescriptive variable names, e.g. $x,$x1,...$xn (where the contents bear no relation to each other), routines that painstakingly produde elaborate results only to have those results later undef'ed without ever being used, comments that are absent or unintelligible, etc.

The list is long, and there are many here who are better qualified to explain obfuscation than I...

However, I am not referring to code in our excellent Obfuscation section. What I am referring to is actual inherited production code. (and I'm not exaggerating)

Dealing with such code can be difficult at best, and after a some time repeatedly throwing up my arms in disgust with a deadline looming, I decided to meditate on ways to approach such code and ways in which one can keep one's sanity in doing so (before quitting my job and becoming a sheepherder or gasp! a PHP web developer). The fact that there are many programmers with many more years of experience than I have, who are still at it, suggests that there must be ways to get the job done without being forcefully relocated to a padded room (although I'm sure that they are comfortable, I rather like living in my apartment).

Here are two resolutions that I have come up with to deal with the situation; I thankfully welcome any additions or suggestions...

I will keep my cool

When reviewing, maintaining, and adding features to code matching my description above, I often find myself frustrated and angry. Starting now, I resolve that when reviewing code, I will no longer ask "WHY!?" when I don't understand the purpose of a chunk of code, but will instead ask "why?". Since this is working production code, it does work. I will make sure that I understand the code, it's inputs and outputs, and it's place in the larger scheme. Also, it's only work, and it's not worth giving myself an ulcer at an early age.

I will resist the urge to unnecessarily rewrite

Even if every important variable is named based on some variation of "temp", the important thing is to produce working code in the timeframe given by my employer. Plus when that variable $a72 that looks like an intermediate variable is used in a routine 100 lines further down the code, resisting the urge to rewrite the code in a more understandable and sane (from my viewpoint) way will reduce the work that I have to do down the line. With that said, I will not hesitate to rewrite code where thoughtful analysis shows that I can do so without a cascading negative effect, and I will take every reasonable opportunity to identify places where I can add code that will assist in debugging.

Whew! (Only two resolutions, but what can I say?) With that said, I would like to say that this is a job, and I realize that, as the saying goes, "If you can't stand the heat...". I welcome any suggestions that I (or another rookie in the same situation) can follow to simplify the task at hand or any resolutions that any of you have come up with when in a similar situation. If there are any good books/articles/etc out there that will help, any references to such would be greatly appreciated.


update fixed minor typo.

Replies are listed 'Best First'.
Re: Difficult code (Resolutions)
by dws (Chancellor) on Aug 21, 2003 at 03:38 UTC
    Also, it's only work, and it's not worth giving myself an ulcer at an early age.

    There'll be plenty of time for that later.

    Even if every important variable is named based on some variation of "temp", the important thing is to produce working code in the timeframe given by my employer.

    There is a middle ground: Clean one room at a time. As you encounter a section of code that you need to extend or do maintenance on, take a few minutes to look it over first. If there are bogus variable names, change them before you change anything else. Then make your changes. By leading with a semantic-preserving refactoring, you eliminate (or greatly reduce) the risk that you'll break things accidentally.

    As much as it pains us to pick up a really foul piece of code without stopping to rewrite it, there often isn't a business case for giving it a thorough scrub. But that doesn't mean we can't do any cleanup. Take it one room at a time.

      Clean one room at a time. As you encounter a section of code that you need to extend or do maintenance on, take a few minutes to look it over first. If there are bogus variable names, change them before you change anything else. Then make your changes. By leading with a semantic-preserving refactoring, you eliminate (or greatly reduce) the risk that you'll break things accidentally.

      Excellent advice. Also make sure that there are tests in place before you change the code so you can be sure that you're not breaking anything.

      My approach when approaching a large nasty lump of Perl code is to do the following:

      1. Write acceptance tests if they don't already exists.
      2. Run it through perltidy so I at least can trust the indentation to not be lying.
      3. Make it compile strict, without warnings and (if appropriate) tainted if it doesn't already.
      4. Review the code making notes of potential bugs, areas of unclear functionality and possible security issues problems.

      This, for me, is the bare minimum needed to get business code into a vaguely safe state. Without acceptance tests we don't know what the code should do, and whether it is actually doing it. Without a common code layout style nobody can read the code. Without strict and warnings we can miss many errors.

      (Yes, I do know that there can be valid reasons for not using strict/warnings. However the people who are bright enough to do this are not the same group of people who write terrible unmaintainable code.)

      In the code review I'm not looking for ugly code - but code that is potentially broken (either functionally or from a security point of view) or that I don't understand. There is an obvious business case for fixing broken code. It should also be obvious that there is a business case for fixing code that is too opaque to be easily understood - because there is a high risk in running things you don't understand.

      Once this has been done I prioritise the requests for new functionality and the changes from the code review with the client and then, as dws says, "Clean one room at a time".

      • If we're adding new functionality write some acceptance tests for it.
      • Find the bit of code that needs updating.
      • Write some unit tests for the existing code if they don't already exist. If we're fixing a bug or a security issue make sure that we have a failing test that illustrates the problem.
      • Test. Code. Refactor. Repeat until done.

      There is a huge temptation when presented with a big pile of... unsavory... code to either throw it all away and start from scratch (almost always a really bad idea) or spend months tidying it up (almost always a waste of time that could be better spent elsewhere). Much better to attack it piecemeal.

      Thank you dws.

      I've read many of your posts, and from what I have read, regard your advice highly (especially in regards to the nature my post). The one room at a time approach seems to be an excellent guide with which to approach the refactoring of the code.

      Thanks again,
Re: Difficult code (Resolutions)
by dragonchild (Archbishop) on Aug 21, 2003 at 13:10 UTC
    Mmmm. Old code. You're a very lucky man. It's easy to be a good programmer when all you have is good code (or no code, which is the same thing) to work with. It's very difficult to be a good programmer when presented with bad code. However, it is only through adverse conditions that one learns about oneself, and it's only through success in obviously adverse conditions that one can impress one's superiors.

    At a previous contract, I was working in a team of, roughly, 5-8 programmers on an application that was pretty important to the business. This app was a replacement of a previous app written on Windows (but was still maintained) and whose replacement already existed (but was still in v0.64beta status). It was also horrendous. I mean, it worked (which is priority 1) and made the business lots of money (which is priority 2), but the code was very hard to maintain.

    I saw an opportunity. There were some 80 reports that had to be outputable in HTML and PDF. We used HTML::Template, but rolled the PDF by hand using PDFLib. There was a PDF::Template out there, but it was very poor. After proving I could support the current PDF wrapper around PDFLib, I went to my group leader and suggested extending PDF::Template to meet our needs. He said that the risk was too great. Ok. I went back to my drawing board and came up with a development cost and a rollout plan. Then, I waited.

    About a month later, a major bug was discovered in the PDF wrapper. I tracked it down and saw my chance. I very slightly overestimated the cost of fixing the bug to where it came close to PDF::Template's cost, then presented my case, suggesting only to convert those reports impacted by the bug. I stressed the benefits of the conversion, including the fact that the 20 new reports coming down the pike would have their development costs cut in half with a greater assurance of no bugs in the PDF section. I stressed the reduced maintenance costs (which were considerable in the PDF area). I got the go-ahead to spend 4 weeks on PDF::Template. (I estimated a 100 hour bug to be 130 hours ... close to 160, neh?)

    Remember - very few bosses want higher development and maintenance costs. What they do want is reduced risk. However, reducing risk next month at the cost of increasing risk this month is unacceptable. You have to demonstrate how you're reducing risk later without increasing risk now. Usually, this means phasing in any improvements over 2-4 releases. And, that's ok.

    Also - you have to prove yourself first. Yeah, they hired you, but that just means you can do the job. In the first 3-6 months is when you prove that you will do the job. After you have some minor successes, go after a medium job. Then, a major job. After I implemented PDF::Template, I was given free rein to do whatever I wanted. So, I put into place a completely new reporting system. However, I only converted those reports that were being touched anyways. That was (and is) the plan until about 60-80% of the reports are in the new system. Then, it makes sense to convert the rest, because it becomes obvious that it is a greater risk to leave them as is than it is to convert them.

    And, follow all the advice above. Tests are your friend.

    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Difficult code (Resolutions)
by dbwiz (Curate) on Aug 21, 2003 at 09:12 UTC

    Rewriting or patching?

    It's a question that almost any programmer has to face sooner or later.

    Deciding which side to stand is really depending on many issues, one of which should be preserving valuable content from the old software.

    This matter has been discussed several times in the Monastery, and without repeating what was said, I offer you the links, so you can judge by yourself.

    I liked very much two articles that were referred in the previous links, where a practical approach to rewriting was described in great detail. It's not an easy task, but it makes sense the way it is described.

    There is no better teacher than experience, so I would recommend to try your skills at a module of your project, so that you can appreciate the magnitude of the task. While you are at it, take into account everything dws and adrianh has said about isolating the changes and testing them.

    Good luck!

Re: Difficult code (Resolutions)
by Anonymous Monk on Aug 21, 2003 at 03:45 UTC

    Any good response to your post will require significantly more information.

    Here are a few questions to start with:

    1. How much code? Are we talking less than a few thousand lines, 20,000+, or somewhere inbetween?
    2. What type of code? Is the language still in active development?
    3. What are you skills? Are you capable of making reliable modifications to the code? Do you know the language well, or just enough to rename variables?
    4. Who owns the code Does it belong to a company or an individual? Is it open source? How resistant is the company to change?
    5. What is the owner's current focus Have they switched all new development to a different platform and/or language? Are they in the process of migrating any other code from the same eroa over?
    6. What is the purpose of the code Is it for a pacemaker or an mp3 player? What level of quality is required.
    7. How well does the code work? Have many bugs been found in the past? Are there an unresolved known problems? How extensively is it tested/used?
    8. How much other code is dependant on it? Can you change the interface? Is it so obscure that you can't tell what the interface is?
    9. How well is it documented? How much do you know about the code's purpose? The interface? Internal workings?

    That's a start anyways. These things are extremely dependant on the details, carefully examine any advice that claims to be a silver bullet.

    Best of luck.

      Good questions...
      1. We're talking about ~8,000 lines of code
      2. The language, although not Perl, is in active development, and we plan to continue developing in such (even though I am currently the only programmer at my company)
      3. I would consider myself of an intermediate skill level in the language, I am fluent in the available built-ins and can translate most algorithms that I find in other languages (and in general algorithmic notation) to the language at hand.
      4. The code belongs to my company, is not Open Source, and they are fairly resistant to change (as far as a change in language is concerned), but I have a fairly free reign over implementation as long as all design requirements are met.
      5. Everything is pretty stable in this regard.
      6. The code is intended for computer based training apps, so bugs are annoying (although as everyone, I'd prefer if there were none), but will not cause loss of life or property.
      7. The code as I got it is good beta code, although there were minor bugs. The main thing is the additional functionality that I am required to make, changes in existing functionality, as well as fixing existing bugs when found by our testers.
      8. It is a stand-alone app.
      9. There is little documentation besides the comments in the code itself (which reside in a range from good in parts to unhelpful/nonexistant.
      10. Thanks

      Two more good questions to ask a new codebase are:

      • Are there acceptance tests? If not, how do we know what the application should do, and whether it is actually doing it. We should consider writing some.
      • Are there unit tests? If not, how do we know what each module should do, and whether it is actually doing it. We should consider writing some.
Re: Difficult code (Resolutions)
by yosefm (Friar) on Aug 21, 2003 at 08:20 UTC
    Adding to the 'one room at a time' approach, if you don't have the time to clean up something that you already took the time to understand how it works, put a simple comment that reports your wisdom, on the lines of:
     # $jjf is really $objectFromJava.

    Now when you return to this code you just mentally s/jjf/objectFromJava/ (Of course in a good text editor this would take a second to actually do, but this is just a simple example).

    Another thing: If indentation needs fixing, do it before you even take a second look at the code. This will de-obfuscate things greatly.

Re: Difficult code (Resolutions)
by Vorlin (Beadle) on Aug 22, 2003 at 01:01 UTC
    I think anyone who does anything with programming and/or systems administration can feel your pain. I've inherited tons of code from previous admins, varying shell/c/vb/vc++, etc and it's been a riot trying to figure out what's going on.

    I believe they fall in several categories and for each one, I've figured out a method that keeps my sanity.

    1: Unorganized Q&D - quick and dirty, we've all seen those. No comments, sloppy code, chaotic design, definitely needs being rewritten as soon as possible. Filed away for later perusal.

    2: Organized Q&D - at least an attempt to have a cleaner code of the previous statement, some comments, however brief, spaced/indented code in most areas. Some sections might need work but overall, functional and not necessarily needing rewriting.

    3: Organized code in development - development code/scripting that looks good but may have design structures that are different than yours and since understanding what every line does, a possible rewrite using existing code but substituting your function/variable names can be done. Also might rewrite just for brevity/improvement/efficiency/etc but not needed due to development. This is common of the "if it ain't broke, don't fix it" clause.

    4: Unorganized Q&D in production - same as point 1, but in production and is desparately in need of an immediate rewrite, however on-the-fly that may be. Bug fix to keep going, but filed immediately for cleaner/improved code.

    5: Semi-organized lengthy inherited code in production - hardest kind to maintain. Production states that it's been added on and maintained for the duration by several groups/people and is good in some areas and possibly horrid in others. Possible rewrites through CVS/SourceSafe/etc are worthy, but only if the entire thing is done, otherwise it's just bug-fix and move on.

    IMHO, I try to rewrite everything that isn't well-documented. I learned rather quick that as long as I know the function of the script/code, I can probably rewrite it faster than trying to figure out what variable $xn2 does. That's the beauty of the call-sign of perl - "There's always another way to do it". That applies to everything in code, production and development alike. And fortunately, my sanity's been able to be kept although there are times when I really want to scream 'WTF is this?' when looking at some obscure non-documented code buried in the archives of legacy programs.
Re: Difficult code (Resolutions)
by TomDLux (Vicar) on Aug 21, 2003 at 20:22 UTC

    There is obfuscation---a feeble palimpsest---and then there is inherited production code.

    When there is revision control on the software, employers get upset if you change too much at any one time. You make a change to handle bug 237 or change request 1743, a problem is found during testing, and diff shows 3700 changed lines. ( of course, if there isn't even revision control, you should run, not walk, for the nearest exit.

    One strategy I've used is to take some idle moments and go over the code, thinking what changes I would introduce. Then re-read M-J Dominus's Red Flags articles, or some of merlyn's articles, where long, badly-written programs collapse into short, efficient scripts.

    Once you have a strategy, you can introduce your changes. As far as I'm concerned, for each line of code that I need to modify in the line of duty, it's fair play to modify a line of code because it improves the program.

    If you need to modify a couple of print statements, simplify a loop at the same time. If you need to implement some new capability in the counter which audits and reports what happens to each line of input, replace the existing counter with an object ... duplicated lines are refactored into methods, comprehensability is improved since the intent is clearer, the code runs faster, and fewer lines are changed than if you added the new code in a dozen places throughout the program.

    Then, next month, you can look at the DB handling .....


Re: Difficult code (Resolutions)
by Ryszard (Priest) on Aug 22, 2003 at 08:05 UTC
    How about:

    I will document

    After i've viewed and made sense of a piece of crappy code i will put comments inline and update/create offline documentation if I have the time.

    A penny saved is a penny earned.

      A pithy way to phrase the above might be:

      If you understand what it does, and you know it works, then you don't need to rewrite it!

      You need to document it so you don't have to bang your head against the wall next time you look at it. In my experience, the only time you need to go through rewriting code is if the offending code needs to be extended or fixed and that is made difficult or impossible by the sheer ugliness of it.

      This I guess is a corrolary to if it ain't broke then don't fix it. That law is *especially* true in software, particularly in high reliability production environments (which is all I've ever worked on BTW). If it *is* broke then that is quite another matter.

      A few other methods I've run across:

      1. Modularize. Have one bit of code that does one job and have it do it for EVERY possible consumer of that sort of task. I'd rather have a if-tree of slightly different ways to accomplish a task than have to go back and fix each one.

      2. Test, test, and when you're done, test some more. Plan to spend at least twice as much time testing as you do coding. It stinks and it sucks but that's the only way to come out with good code. Use automated tests to cut the boredom.

      3. Write readable code. Comment. Avoid the elegant approach in favor of the simple approach whereever performance allows. You *will* have to fix every bit of code that you write at some point. It's a certainty.

        This I guess is a corrolary to if it ain't broke then don't fix it.

        I would consider any code that's too complex to be understood without comments to be broken :-)

        If you don't spend the time to make your code simpler it just gets more and more complex over time - until you eventually end up with a big ball of mud.

        Don't accept the evils of code entropy! Continually refactor your code to make it as simple as possible. You'll be surprised how quickly it pays off.

      After i've viewed and made sense of a piece of crappy code i will put comments inline and update/create offline documentation if I have the time.

      ... but only if I don't have the time to write some tests that demonstrate what the code does and refactor the code into something more comprehensible.

      Because if I don't rewrite that code into something more comprehensible I'll regret it six months down the road when a couple of feature requests and bug fixes have caused my comments and offline documentation to become out-of-sync with the code again.

      Sure as onions is onions.

Re: Difficult code (Resolutions)
by gsiems (Deacon) on Aug 22, 2003 at 17:29 UTC
    I used to do maintenance on a in-house system that had originally been developed by some consultants (I used to not like consultants as a result of that). Parts of it were reasonably well written and other parts were really, really ugly.

    While there were a few cases where entire chunks of the app got ripped/replaced as they were too horribly broken to be fixed incrementally, my approach was mostly to fix *only* the piece that needed to changing (that piece, that whole piece, and mostly nothing but that piece). FWIW, I considered a piece to be a function, subroutine, and/or method.

    For that piece,

    1. First I'd fix the indenting/whitespace.

    2. Then I'd pick a variable, figure out a better name and rename it-- I tended to search/replace one by one and checked how the variable was being used as I went. It was amazing how much dead code there was in the form of

       declare $variable...
       set $variable...
       set $variable again...
       set $variable yet again...
       end of code piece-- value of $variable never used.

    3. Repeated step 2 for all variables, removing dead sections and unused variables as they were found.

    4. Ensured that the code still worked the same as before I started.

    5. Refactored the piece if it made sense to do so.

    6. Tested it again.

    7. Made my change.

    8. Tested it again.

    Using this approach, I never rewrote the whole thing application, just the pieces that needed changes whether those changes were for bug fixes or new/different functionality.

    Some observations that I made alng the way was that there tended to be a correlation between:
    1. formatting and code quality (poor formatting ~ poor quality),
    2. global variable useage and code quality (lot's of globals used ~ poor quality), and
    3. unused variables and code quality (lot's of unused variables ~ poor quality).

    my 2 cents.

    The box said "Requires Windows 98 or better" so I installed Linux.

      4. Ensured that the code still worked the same as before I started.

      How did you do that? Just wondering...


        Probably not very well. At that time in my life/learning curve, testing usually consisted of running that portion of the app that used the code being changed, trying various things that the users would do and calling it good. Sometimes I would combine that with logging the imputs/outputs and intermediate values while running the app and the sift through the results to see if it did what I expected. In time I was able to create a test harness for some parts, but realistically, that kind of testing only seemed to work well for those pieces/parts could be isolated from everything else. Very ad-hock. Part of the problem was that almost everything (business rules, data access, etc.), as delivered, was buried in the GUI :-(.