Beginnings
I'm in charge of document and reverse-engineering a
large system composed mainly of unorganized, undocumented, roten and
ragged Perl Code, probably written for Perl 5_0005 in a AIX
machine.
The History
It was being maintained for the last seven years by folks that know
nothing about "good pratices" on software engineering and the code
needs refactoring a.s.a.p.
The Mission
My mission, right now, is rise as much as I can from the existing
code, and I see a big ammount of information running away to the
backyards door if I don't document all the existing subs and its
dependencies.
Gory Details
What I already got: I have a nice hack using File::Find that was
able to put me on my way by using simple regular expressions to
recognize and capture subrotine declarations and qualify them by
packages. Code follows:
#!c:\perl\bin\perl.exe
use strict;
use warnings;
use constant CODE => '/path/to/code/rootdir/';
use File::Find;
my %source;
find( \&wanted, CODE );
##
## pkg( $filename )
## Determina o pacote (package) que o arquivo implementa.
##
sub pkg{
my $file = shift;
my $pack;
open FILE, $file or die $1;
while( <FILE> ){
if( m{^package\s+([\w:]+)\s*} ){
$pack = $1;
last;
}
}
close FILE or die $1;
return $pack;
}
##
## uses( $filename )
## Determina as bibliotecas que este módulo|script usa|requer.
##
sub uses{
my $file = shift;
my %libs;
open FILE, $file or die $1;
while( <FILE> ){
$libs{ $1 }++ if m/^\s*(?:use|require)\s+(\S+).*?;/;
}
close FILE or die $!;
return wantarray ? ( keys %libs ) : [ keys %libs ];
}
##
## loc( $filename )
## Conta as linhas de código existentes em um determinado arquivo.
##
sub loc{
my( $file, $counter ) = ( shift );
open FILE, $file or die $!;
$counter++ while( <FILE> );
close FILE or die $!;
return $counter;
}
##
## subs( $filename )
## Determina o nome das subrotinas declaradas em $filename.
##
sub subs{
my $file = shift;
my @subs;
my $package;
open FILE, $file or die $1;
while( <FILE> ){
$package = $1 if m{^package\s+([\w:]+)\s*};
push @subs, $1 if /^sub\s+([\w:]+)/;
}
close FILE or die $1;
@subs = map { s/$package\:\://o; $_ } @subs if $package;
return wantarray? @subs : \@subs;
}
##
## version( $filename )
## Tenta determinar a versão do módulo ou script
## inspecionando a variável $VERSION definida por ele.
##
sub version{
my $file = shift;
my $version;
open FILE, $file or die $1;
while( <FILE> ){
if( m/VERSION\s*=\s*/ ){
$version = $_;
last;
}
}
close FILE or die $1;
{
no strict;
if( $version ){
eval $version;
$version = $VERSION;
$version = '0.1' if $version =~ /Revision/;
}else{
$version = '';
}
}
return $@? $@ : $version;
}
##
## wanted( $filename )
## Determina se desejamos ou não contar com este arquivo na nossa aval
+iação.
## Chamada como call-back pelo File::Find::find.
##
sub wanted{
my $f;
my $file = $_;
return unless ( -f && /\.(p[lm])$/ );
my $type = ( $1 eq 'pl' ? 'script' : 'module' );
( $f = $File::Find::name ) =~ s{CODE}{}o;
$source{ $f } = { type => $type,
name => $file,
filename => $f,
LoC => loc( $File::Find::name ),
'package' => pkg( $File::Find::name ) || 'main',
libs => uses( $File::Find::name ) || [],
subs => subs( $File::Find::name ) || [],
version => version( $File::Find::name ),
};
}
my $packages;
for my $file ( keys %source ){
push @{$packages->{ $source{$file}->{package}||'main' }}, $source{
+$file};
}
# use Data::Dumper;
# print Dumper( $packages );
# print Dumper( \%source );
print q{
<html>
<head>
<style>
body {
margin: 5% 10% 5% 10%;
background-color: white;
font-size: 130%;
}
div.system {
background-color: #EEE;
padding: 10 20 10 20;
}
p.title{
font-size: 180%;
font-weight: bold;
text-align: center;
}
div.package{
margin: 5 5 0 5;
padding: 0 5 5 5;
border: solid thin black;
}
div.file{
margin: 5 10 5 10;
padding: 5 5 5 5;
background-color: #DDD;
}
</style>
<title>Relatório de Análise do Código Fonte</
+title>
</head>
<body>
<div class="system">
<p class="title">Relatório de Análise do C&oacut
+e;digo Fonte</p>
};
my $total_loc = 0;
my $total_subs = 0;
my $total_packages = scalar keys %$packages;
my $total_files = scalar keys %source;
map { map { $total_loc += $_->{LoC} } @$_ } values %$packages;
map { map { $total_subs += scalar @{$_->{subs}} } @$_ } values %$packa
+ges;
print qq{<p>Total pacotes: $total_packages.<br>Total Arquivos: $total_
+files.<br>Total linhas: $total_loc.<br>Subrotinas: $total_subs.</p>\n
+\n};
foreach my $pack( sort keys %$packages ){
my @files = sort { $a->{name} cmp $b->{name} } @{$packages->{$pack
+}};
my $package = $files[0]->{package} || 'main';
print qq{<div class="package"> <a name="#package_${package}">\n<p>
+Pacote <code>${package}</code> </p>\n\n};
foreach my $f ( @files ){
$f->{filename} =~ s/\'\"//og;
$f->{name} =~ s/\'\"//og;
print qq{<div class="file"> <a name="#file_$f->{name}">\n<p>Arquiv
+o <code>$f->{filename}</code> </p>\n\n<ul>};
print qq{<li>Tipo: <code>$f->{type}</code></li>\n};
print qq{<li>Linhas de Código: $f->{LoC}</li>\n};
print qq{<li>Versão: $f->{version}</li>\n};
if( scalar @{$f->{libs}} ){
print qq{<li>Dependências:\n<ul>\n};
foreach my $lib( sort @{$f->{libs}} ){
$lib =~ s/\'\"//og;
if( exists $$packages{$lib} || grep /\L$lib\E/, map lc, keys %
+$packages ){
print qq{<li> <a href="#package_$lib">$lib</a></li>\n};
}else{
print qq{<li>$lib</li>\n};
}
}
print qq{</ul>\n};
}
if( scalar @{$f->{subs}} ){
print qq{<li>Subrotinas Implementadas:\n<ul>\n};
foreach my $sub( sort @{$f->{subs}} ){
print qq{<li><code>$sub()</code></li>\n};
}
print qq{</li>\n};
}
print qq{</ul></div> <!-- class file -->\n};
}
print qq{</div> <!-- class package -->\n};
}
Considerations
I know this is not the Best Way To Do It, but I'm sure its
working. The output is kind of nicely formatted HTML linking modules
dependencies inside my source code.
Next Steps
My next step would be mapping every subrotine call so I can follow
calls along the modules and determine real modules dependencies
between modules. It seems that the old developer think its easy just
cut'n'paste all possible use Module; from the system than
decide what modules are really needed for a given module or
program.
Looking For
In sintesys, I'm looking for suggestions about what kind of
strategy I should use to parse all this Perl code (about 1.85Mb of
ascii text) and build the information I need about it in a few hours
(this is all I have now, sorry).
Limitations
My limitations: I have no access to other modules than not the
default in ActivePerl 5.6.1 (this is internal politics: I can't use
anything not previously checked by security folks, including
open-source tools). This is kind of a problem, given that the
ActivePerl 5.6.1 lacks the IniFile module, needed to parse the
original system config files.
A Possible Start Point
Talking about a start point, maybe
I should scan a module file looking for subs, and inside each sub,
look for barewords (this will probably lead me to a better chance of
having a subrotine call). After that, I need to scan the module for
the candidate subrotine name and if I can find it, store this
information. If there is no subrotine declaration for the given
bareword, I need to scan modules listed on @ISA, in
order, for the same thing, recursivelly until I find the sub
declaration or run out of modules to scan.
Suggestions?
Of course, it seems to me that this could take a big amount of
time. Maybe there are better approaches for this problem...
Updates
Fixed a little issue about a misplaced <code> tag in the article; Many thanks to jasonk for pointing this.
wazoox just told me that the '<' signs at my code was errouneously translated to '<'. Thank you for that.