comment on

Here's the blog post as it stands:

Recently I came across a bunch of server side code, implemented in the form of Perl scripts and modules – about a hundred of them. Many of them are a few thousand lines long, and they interact with each other in complex ways.

In order to understand how this system works, I had to figure out the program flow. This was difficult especially because it was not easy to reproduce the input, and I didn’t even know where to start. I decided to take the primitive but effective debugging approach of adding print statements for tracing the flow of execution. Trace statements, as they are formally called, are easier to add in smaller programs though.

In order to automate this process of adding trace statements, I tried 3 ways, one after the other:

Parsing with regular expressions: The first and most crude approach was to modify all these scripts, by parsing them, finding subroutine definitions, and adding a print statement right after the beginning of each. However, a lot of cases were overlooked while writing the regular expression, and it ended up matching the word “sub” in strings and comments and the result was disastrous. The code base was ruined beyond hope for manual repair, but of course I had backups on my own machine and also on a repository.
PPI: After some research I found a Perl module to parse Perl code. PPI, originally an acronym for Parse::Perl::Isolated, parses Perl code as documents, breaking it down in tokens in a strict hierarchical fashion. More details here. Using this, the task of finding subroutine definitions was simplified and made more reliable. PPI::Document -> find(‘PPI::Statement::Sub’) was all that was needed. Then, finding all the ‘children’ of each sub, and looking for PPI::Structure::Block (by checking their refs) got the beginning of each sub.
```
for my $child ( $sub->children ) {
 $child->start->add_content($caller)
 if ref $child eq "PPI::Structure::Block";
}
[download]
```
$caller is the print statement passed as a string.
Hook::LexWrap: If you only want to add a trace statement (or any piece of code) at the beginning / end of a subroutine, Hook::LexWrap is a much cleaner way to do this. It doesn’t need you to change the original subroutines in any way. Just adding a few lines of code at the start of each file will suffice. In the following code, @all_subs is the array containing the names of all subroutines in the current file. The “wrap $sub, pre =>”line pre-wraps a subroutine, i.e. executes a piece of code just before the subroutine is executed.
```
my @all_subs = qw (sub1 sub2);
for my $sub (@all_subs) {
 wrap $sub, pre => sub {
 print "Calling '$sub' in file: $0\n";
 };
}
[download]
```

There must be better ways to analyse how a huge and complicated set of Perl scripts works, but this is what I have discovered so far.

Alex / talexb / Toronto

"Groklaw is the open-source mentality applied to legal research" ~ Linus Torvalds

In reply to Re: Analyzing Perl Code by talexb
in thread Analyzing Perl Code by nitin1704

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Come for the quick hacks, stay for the epiphanies.
	PerlMonks