http://www.perlmonks.org?node_id=448222


in reply to Analyzing large Perl code base.

Having recently done this on a fairly large codebase that grew organically (no design, no refactoring) over the course of four years, I feel your pain.

Writing a testsuite, on any level, is nearly essential for this. If you're rewriting an existing module, you'll need to ensure it's compatible with the old one, and the only sane way to do that is to test. If the old code is monolithic, it might be difficult to test individual units, but don't let that stop you from testing at a higher level.

B::Xref helped me make sense of the interactions in the old codebase. I didn't bother with any visualization tools or graph-creation, though. I just took the output of perl -MO=Xref filename for each file, removed some of the cruft with a text editor, ran it through mpage -4 to print, and spent a day with coffee and pencil, figuring out how things worked.

Pretty much the same tactic was used on the actual code. Print it out, annotate it away from the computer, and then sit down with the notes to implement the refactoring. If your codebase is huge (mine was about 4-5k lines in several .pl and .pm files, and was still manageable) you might not want to do this, though.

Replies are listed 'Best First'.
Re^2: Analyzing large Perl code base.
by dmitri (Priest) on Apr 15, 2005 at 18:10 UTC
    2.5 megs of Perl code across 237 different modules, I will need a lot of coffee :-)

    Thanks for your suggestions.

      At least you have modules. You should be able to organize those modules into logical groupings. Once you do that, focus on one grouping at a time, writing lots and lots and LOTS of tests. Test everything, anything ... if it moves, test it. Heck, test it even if it doesn't move. (You want to make sure it doesn't start moving!)

      Note: you will find that many of your tests will be wrong ... and that's good. :-)

      Update: As adrianh says, you shouldn't write whitebox tests - you should be writing tests for how the rest of the code expects your modules to work. In other words, API tests. Remember - you're planning on ripping the guts out ASAP. You just want to make sure that the rest of the code doesn't die while you're working.

        Once you do that, focus on one grouping at a time, writing lots and lots and LOTS of tests. Test everything, anything ... if it moves, test it. Heck, test it even if it doesn't move. (You want to make sure it doesn't start moving!)

        While I don't think this is what you're proposing - I think this could be read as "write tests for everything in the legacy code before you change anything" which, IMHO, is a bad practice. As I said here

        A counter productive practice that I've seen is to go through a large piece of legacy code and add developer tests for everything. Doing this with legacy code not driven by tests will produce a test suite that is brittle in the face of change. When you get to the refactoring you're going to find that you're going to be continually throwing away a lot of the new tests so you don't get any benefit from them.

        In my experience it's much more effective to build the test suite around the changes you make to the code. Add tests when you add new features. Add tests around code that you're refactoring. Add tests to demonstrate bugs. In my experience just following those three rules naturallys build a test suite around the most important code.