Analyzing large Perl code base.

dmitri has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I have finally (after over a year of bitching at all the bugs) forced my boss to allocate time for me to audit, analyze, and fix all of our company's Perl modules. This is a great project.

However, I think I will need some tools. Before writing my own, I googled and searched PM.org but with little success. I wonder if some of you may know of existing libraries or tools to analyze Perl code, graph it, etc, etc.

Of course, I will use Perl::Tidy and I also found this (no code, though). Are there other recommendations?

Thank you,

- Dmitri.

Comment on Analyzing large Perl code base.

Replies are listed 'Best First'.
Re: Analyzing large Perl code base. by BrowserUk (Patriarch) on Apr 14, 2005 at 22:17 UTC
Devel::Xref is extremely useful for picking apart complex code. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. Rule 1 has a caveat! -- Who broke the cabal?	[reply] [d/l]
Re: Analyzing large Perl code base. by gam3 (Curate) on Apr 14, 2005 at 22:14 UTC
Autodia::Handler::Perl and Graphviz::ISA are two that I know of. I have used Autodia on SQL schemas. -- gam3 A picture is worth a thousand words, but takes 200K.	[reply]
Re: Analyzing large Perl code base. by adrianh (Chancellor) on Apr 15, 2005 at 14:51 UTC
Devel::Cover is a great tool for code exploration in combination with tests. Write an end-to-end test for a bit of functionality. Run it under Devel::Cover. Look at the coverage report to see which chunk of your big ball of mud was related to the task. Refactor that chunk. Repeat for the next bit of functionality.	[reply]
Re^2: Analyzing large Perl code base. by spurperl (Priest) on Apr 15, 2005 at 15:03 UTC
I don't think that this can be a reliable code-analysis tool in the general case. Too many programs contain complex initialization routines that use a lot of code and make coverage output not very useful. At times enabling or disabling something during a text will only differ in a small, barely invisible routine call when looking at the coverage logs. Using coverage for analysis has its merits, but it requires some a-priori acquintance with the code.	[reply]
Re^3: Analyzing large Perl code base. by adrianh (Chancellor) on Apr 15, 2005 at 15:59 UTC
At times enabling or disabling something during a text will only differ in a small, barely invisible routine call when looking at the coverage logs. Luckily for us we have these wonderful computer thingies which are terribly good at looking through large chunks of data for differences :-)	[reply]
Re: Analyzing large Perl code base. by dave0 (Friar) on Apr 15, 2005 at 15:32 UTC
Having recently done this on a fairly large codebase that grew organically (no design, no refactoring) over the course of four years, I feel your pain. Writing a testsuite, on any level, is nearly essential for this. If you're rewriting an existing module, you'll need to ensure it's compatible with the old one, and the only sane way to do that is to test. If the old code is monolithic, it might be difficult to test individual units, but don't let that stop you from testing at a higher level. B::Xref helped me make sense of the interactions in the old codebase. I didn't bother with any visualization tools or graph-creation, though. I just took the output of `perl -MO=Xref filename` for each file, removed some of the cruft with a text editor, ran it through `mpage -4` to print, and spent a day with coffee and pencil, figuring out how things worked. Pretty much the same tactic was used on the actual code. Print it out, annotate it away from the computer, and then sit down with the notes to implement the refactoring. If your codebase is huge (mine was about 4-5k lines in several .pl and .pm files, and was still manageable) you might not want to do this, though.	[reply] [d/l] [select]
Re^2: Analyzing large Perl code base. by dmitri (Priest) on Apr 15, 2005 at 18:10 UTC
2.5 megs of Perl code across 237 different modules, I will need a lot of coffee :-) Thanks for your suggestions.	[reply]
Re^3: Analyzing large Perl code base. by dragonchild (Archbishop) on Apr 15, 2005 at 18:15 UTC
At least you have modules. You should be able to organize those modules into logical groupings. Once you do that, focus on one grouping at a time, writing lots and lots and LOTS of tests. Test everything, anything ... if it moves, test it. Heck, test it even if it doesn't move. (You want to make sure it doesn't start moving!) Note: you will find that many of your tests will be wrong ... and that's good. :-) Update: As adrianh says, you shouldn't write whitebox tests - you should be writing tests for how the rest of the code expects your modules to work. In other words, API tests. Remember - you're planning on ripping the guts out ASAP. You just want to make sure that the rest of the code doesn't die while you're working. My wife's blog	[reply]
Re^4: Analyzing large Perl code base. by adrianh (Chancellor) on Apr 16, 2005 at 14:49 UTC
Re^5: Analyzing large Perl code base. by dragonchild (Archbishop) on Apr 18, 2005 at 12:43 UTC
Re: Analyzing large Perl code base. by eyepopslikeamosquito (Archbishop) on Apr 16, 2005 at 05:52 UTC
The earlier advice from others re writing tests and re-factoring is sound. You are, however, most unlikely to be given enough time to do it all, so you must choose wisely which code to clean up first. How to choose? i) write tests for all recent (and new) bugs; ii) focus on modules you consider to be most vital and highest risk; iii) Go through the user manual and write a test (and refactor where appropriate) for each example given there (i.e. focus on client view of the system). Perhaps more important is to ensure that all new code is developed test-first and with a solid test suite. I've been (and am still going) through something similar as mentioned in What is the best way to add tests to existing code?. As expected, and despite earlier assurances, I did not get anywhere near the time and resources I would have liked. Bottom line: this sort of code cleanup, while strategically sound in the longer term, does not bring in immediate revenue. Update: You might pick up some good ideas from the book Perl Medic by Peter Scott. Ditto from the node starting to write automated tests.	[reply]
Re^2: Analyzing large Perl code base. by dmitri (Priest) on Jun 29, 2005 at 21:50 UTC
I picked up that book several weeks ago, and it's great. I also recommend it to everyone doing Perl code maintenance. A stab at reviewing the book is taken here.	[reply]
Re: Analyzing large Perl code base. by dragonchild (Archbishop) on Apr 15, 2005 at 00:08 UTC
The only tool you will need is a test suite. My wife's blog	[reply]
Re^2: Analyzing large Perl code base. by gellyfish (Monsignor) on Apr 15, 2005 at 10:05 UTC
To be fair, I think you probably will want some decent documentation first and the analysis tools can help with that :-) /J\	[reply]
Re^3: Analyzing large Perl code base. by dragonchild (Archbishop) on Apr 15, 2005 at 13:06 UTC
You don't need to document the dreck you have. You need to document what marketing believes the dreck you have does. The last place you want to look for that is the actual source code. Now, I only say this because the OP said (and I paraphrase) "I have a bunch of spaghetti that I can't figure out, so how do I clean it up?" The answer is "Write some tests, then refactor, then write some tests, then refactor, then ..." Your testsuite then becomes the basis for your documentation. Obviously, you convert all the ok() calls into English or Swahili or whatever, but it's still the foundation. My wife's blog	[reply]
Re^4: Analyzing large Perl code base. by kscaldef (Pilgrim) on Apr 15, 2005 at 16:40 UTC
Re^5: Analyzing large Perl code base. by dragonchild (Archbishop) on Apr 15, 2005 at 17:26 UTC
Some notes below your chosen depth have not been shown here
Re^3: Analyzing large Perl code base. by planetscape (Chancellor) on Jun 18, 2005 at 06:33 UTC
If you do decide to go with documentation tools (before, during, or after, and at the least I would pick after), you might wish to start with these: Our own castaway pointed me in the direction of podgen, which will jump-start the process of commenting your monolith. ~~DoxyFilt~~* is a filter than allows the well-known Javadoc-like source-to-documentation tool Doxygen to understand Perl. Once you have your source commented, documentation becomes absurdly easy. Using Doxygen before and during the analysis process is often helpful for "getting the lay of the land." There is no reason why you need to limit yourself to using doc tools only once. ;-) HTH, * Update: 2005-12-28 Kudos to both john_oshea and tfrayner for alerting me to the fact that my link above has been rendered usless by the foul creatures known as spammers... I have found what appears to be a good link to obtain DoxyFilt; the most recent version seems to be from August 24, 2005: Doxygen-0.84.tar.gz. Thanks again, guys! planetscape	[reply]
Re: Analyzing large Perl code base. by spurperl (Priest) on Apr 15, 2005 at 11:32 UTC
There are some non-free tools that can help you. For instance, ActiveState has a full-fledged Perl IDE that will organize the code into modules/packages graphically, and from which you can do a lot of code-analyzing (what is called by what, etc). There are a few other Perl IDEs online.	[reply]

Back to Seekers of Perl Wisdom