Re: Analyzing Perl Code

Here's the blog post as it stands:

Recently I came across a bunch of server side code, implemented in the form of Perl scripts and modules – about a hundred of them. Many of them are a few thousand lines long, and they interact with each other in complex ways.

In order to understand how this system works, I had to figure out the program flow. This was difficult especially because it was not easy to reproduce the input, and I didn’t even know where to start. I decided to take the primitive but effective debugging approach of adding print statements for tracing the flow of execution. Trace statements, as they are formally called, are easier to add in smaller programs though.

In order to automate this process of adding trace statements, I tried 3 ways, one after the other:

Parsing with regular expressions: The first and most crude approach was to modify all these scripts, by parsing them, finding subroutine definitions, and adding a print statement right after the beginning of each. However, a lot of cases were overlooked while writing the regular expression, and it ended up matching the word “sub” in strings and comments and the result was disastrous. The code base was ruined beyond hope for manual repair, but of course I had backups on my own machine and also on a repository.
PPI: After some research I found a Perl module to parse Perl code. PPI, originally an acronym for Parse::Perl::Isolated, parses Perl code as documents, breaking it down in tokens in a strict hierarchical fashion. More details here. Using this, the task of finding subroutine definitions was simplified and made more reliable. PPI::Document -> find(‘PPI::Statement::Sub’) was all that was needed. Then, finding all the ‘children’ of each sub, and looking for PPI::Structure::Block (by checking their refs) got the beginning of each sub.
```
for my $child ( $sub->children ) {
 $child->start->add_content($caller)
 if ref $child eq "PPI::Structure::Block";
}
[download]
```
$caller is the print statement passed as a string.
Hook::LexWrap: If you only want to add a trace statement (or any piece of code) at the beginning / end of a subroutine, Hook::LexWrap is a much cleaner way to do this. It doesn’t need you to change the original subroutines in any way. Just adding a few lines of code at the start of each file will suffice. In the following code, @all_subs is the array containing the names of all subroutines in the current file. The “wrap $sub, pre =>”line pre-wraps a subroutine, i.e. executes a piece of code just before the subroutine is executed.
```
my @all_subs = qw (sub1 sub2);
for my $sub (@all_subs) {
 wrap $sub, pre => sub {
 print "Calling '$sub' in file: $0\n";
 };
}
[download]
```

There must be better ways to analyse how a huge and complicated set of Perl scripts works, but this is what I have discovered so far.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

Comment on Re: Analyzing Perl Code Select or Download Code

Replies are listed 'Best First'.
Re^2: Analyzing Perl Code by Your Mother (Archbishop) on Sep 20, 2012 at 15:44 UTC
Unless you have permission from the author or there is a posted license/copyright to allow it, wholesale reposts aren't kosher. :\|	[reply]

In Section Meditations