Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Reverse Engineering Perl Using... Perl.

by monsieur_champs (Curate)
on Jun 17, 2005 at 14:08 UTC ( #467728=perlmeditation: print w/replies, xml ) Need Help??


I'm in charge of document and reverse-engineering a large system composed mainly of unorganized, undocumented, roten and ragged Perl Code, probably written for Perl 5_0005 in a AIX machine.

The History

It was being maintained for the last seven years by folks that know nothing about "good pratices" on software engineering and the code needs refactoring a.s.a.p.

The Mission

My mission, right now, is rise as much as I can from the existing code, and I see a big ammount of information running away to the backyards door if I don't document all the existing subs and its dependencies.

Gory Details

What I already got: I have a nice hack using File::Find that was able to put me on my way by using simple regular expressions to recognize and capture subrotine declarations and qualify them by packages. Code follows:

#!c:\perl\bin\perl.exe use strict; use warnings; use constant CODE => '/path/to/code/rootdir/'; use File::Find; my %source; find( \&wanted, CODE ); ## ## pkg( $filename ) ## Determina o pacote (package) que o arquivo implementa. ## sub pkg{ my $file = shift; my $pack; open FILE, $file or die $1; while( <FILE> ){ if( m{^package\s+([\w:]+)\s*} ){ $pack = $1; last; } } close FILE or die $1; return $pack; } ## ## uses( $filename ) ## Determina as bibliotecas que este módulo|script usa|requer. ## sub uses{ my $file = shift; my %libs; open FILE, $file or die $1; while( <FILE> ){ $libs{ $1 }++ if m/^\s*(?:use|require)\s+(\S+).*?;/; } close FILE or die $!; return wantarray ? ( keys %libs ) : [ keys %libs ]; } ## ## loc( $filename ) ## Conta as linhas de código existentes em um determinado arquivo. ## sub loc{ my( $file, $counter ) = ( shift ); open FILE, $file or die $!; $counter++ while( <FILE> ); close FILE or die $!; return $counter; } ## ## subs( $filename ) ## Determina o nome das subrotinas declaradas em $filename. ## sub subs{ my $file = shift; my @subs; my $package; open FILE, $file or die $1; while( <FILE> ){ $package = $1 if m{^package\s+([\w:]+)\s*}; push @subs, $1 if /^sub\s+([\w:]+)/; } close FILE or die $1; @subs = map { s/$package\:\://o; $_ } @subs if $package; return wantarray? @subs : \@subs; } ## ## version( $filename ) ## Tenta determinar a versão do módulo ou script ## inspecionando a variável $VERSION definida por ele. ## sub version{ my $file = shift; my $version; open FILE, $file or die $1; while( <FILE> ){ if( m/VERSION\s*=\s*/ ){ $version = $_; last; } } close FILE or die $1; { no strict; if( $version ){ eval $version; $version = $VERSION; $version = '0.1' if $version =~ /Revision/; }else{ $version = ''; } } return $@? $@ : $version; } ## ## wanted( $filename ) ## Determina se desejamos ou não contar com este arquivo na nossa aval +iação. ## Chamada como call-back pelo File::Find::find. ## sub wanted{ my $f; my $file = $_; return unless ( -f && /\.(p[lm])$/ ); my $type = ( $1 eq 'pl' ? 'script' : 'module' ); ( $f = $File::Find::name ) =~ s{CODE}{}o; $source{ $f } = { type => $type, name => $file, filename => $f, LoC => loc( $File::Find::name ), 'package' => pkg( $File::Find::name ) || 'main', libs => uses( $File::Find::name ) || [], subs => subs( $File::Find::name ) || [], version => version( $File::Find::name ), }; } my $packages; for my $file ( keys %source ){ push @{$packages->{ $source{$file}->{package}||'main' }}, $source{ +$file}; } # use Data::Dumper; # print Dumper( $packages ); # print Dumper( \%source ); print q{ <html> <head> <style> body { margin: 5% 10% 5% 10%; background-color: white; font-size: 130%; } div.system { background-color: #EEE; padding: 10 20 10 20; } p.title{ font-size: 180%; font-weight: bold; text-align: center; } div.package{ margin: 5 5 0 5; padding: 0 5 5 5; border: solid thin black; } div.file{ margin: 5 10 5 10; padding: 5 5 5 5; background-color: #DDD; } </style> <title>Relat&oacute;rio de An&aacute;lise do C&oacute;digo Fonte</ +title> </head> <body> <div class="system"> <p class="title">Relat&oacute;rio de An&aacute;lise do C&oacut +e;digo Fonte</p> }; my $total_loc = 0; my $total_subs = 0; my $total_packages = scalar keys %$packages; my $total_files = scalar keys %source; map { map { $total_loc += $_->{LoC} } @$_ } values %$packages; map { map { $total_subs += scalar @{$_->{subs}} } @$_ } values %$packa +ges; print qq{<p>Total pacotes: $total_packages.<br>Total Arquivos: $total_ +files.<br>Total linhas: $total_loc.<br>Subrotinas: $total_subs.</p>\n +\n}; foreach my $pack( sort keys %$packages ){ my @files = sort { $a->{name} cmp $b->{name} } @{$packages->{$pack +}}; my $package = $files[0]->{package} || 'main'; print qq{<div class="package"> <a name="#package_${package}">\n<p> +Pacote <code&gt;${package}</code&gt; </p>\n\n}; foreach my $f ( @files ){ $f->{filename} =~ s/\'\"//og; $f->{name} =~ s/\'\"//og; print qq{<div class="file"> <a name="#file_$f->{name}">\n<p>Arquiv +o <code&gt;$f->{filename}</code&gt; </p>\n\n<ul>}; print qq{<li>Tipo: <code&gt;$f->{type}</code&gt;</li>\n}; print qq{<li>Linhas de C&oacute;digo: $f->{LoC}</li>\n}; print qq{<li>Vers&atilde;o: $f->{version}</li>\n}; if( scalar @{$f->{libs}} ){ print qq{<li>Depend&ecirc;ncias:\n<ul>\n}; foreach my $lib( sort @{$f->{libs}} ){ $lib =~ s/\'\"//og; if( exists $$packages{$lib} || grep /\L$lib\E/, map lc, keys % +$packages ){ print qq{<li> <a href="#package_$lib">$lib</a></li>\n}; }else{ print qq{<li>$lib</li>\n}; } } print qq{</ul>\n}; } if( scalar @{$f->{subs}} ){ print qq{<li>Subrotinas Implementadas:\n<ul>\n}; foreach my $sub( sort @{$f->{subs}} ){ print qq{<li><code&gt;$sub()</code&gt;</li>\n}; } print qq{</li>\n}; } print qq{</ul></div> <!-- class file -->\n}; } print qq{</div> <!-- class package -->\n}; }


I know this is not the Best Way To Do It, but I'm sure its working. The output is kind of nicely formatted HTML linking modules dependencies inside my source code.

Next Steps

My next step would be mapping every subrotine call so I can follow calls along the modules and determine real modules dependencies between modules. It seems that the old developer think its easy just cut'n'paste all possible use Module; from the system than decide what modules are really needed for a given module or program.

Looking For

In sintesys, I'm looking for suggestions about what kind of strategy I should use to parse all this Perl code (about 1.85Mb of ascii text) and build the information I need about it in a few hours (this is all I have now, sorry).


My limitations: I have no access to other modules than not the default in ActivePerl 5.6.1 (this is internal politics: I can't use anything not previously checked by security folks, including open-source tools). This is kind of a problem, given that the ActivePerl 5.6.1 lacks the IniFile module, needed to parse the original system config files.

A Possible Start Point

Talking about a start point, maybe I should scan a module file looking for subs, and inside each sub, look for barewords (this will probably lead me to a better chance of having a subrotine call). After that, I need to scan the module for the candidate subrotine name and if I can find it, store this information. If there is no subrotine declaration for the given bareword, I need to scan modules listed on @ISA, in order, for the same thing, recursivelly until I find the sub declaration or run out of modules to scan.


Of course, it seems to me that this could take a big amount of time. Maybe there are better approaches for this problem...


Fixed a little issue about a misplaced <code> tag in the article; Many thanks to jasonk for pointing this.

wazoox just told me that the '<' signs at my code was errouneously translated to '&lt;'. Thank you for that.

Replies are listed 'Best First'.
Re: Reverse Engineering Perl Using... Perl.
by BrowserUk (Pope) on Jun 17, 2005 at 14:22 UTC

    Take a good look at Devel::Xref. I think it could save you a good deal of effort.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
Re: Reverse Engineering Perl Using... Perl.
by rinceWind (Monsignor) on Jun 17, 2005 at 16:27 UTC

    If you want to catch all the named subroutines, these will be in the symbol table hash (stash) %main:: (also known as %::) or %Foo::Bar::.

    One way to get this is through the perl debugger. Run a script and issue the following command:

    DB<1> x %::

    This will pick up all the global variables and subs in main:: and also all the packages. You can walk into each package namespace and get further globals, packages and subs.

    You can also get the debugger to read its commands from a file instead of stdin, which may help automate your process.

    Hope this helps


    Oh Lord, won’t you burn me a Knoppix CD ?
    My friends all rate Windows, I must disagree.
    Your powers of persuasion will set them all free,
    So oh Lord, won’t you burn me a Knoppix CD ?
    (Missquoting Janis Joplin)

Re: Reverse Engineering Perl Using... Perl.
by Solo (Deacon) on Jun 17, 2005 at 18:19 UTC


    You said you wanted to be around when I made a mistake; well, this could be it, sweetheart.
Re: Reverse Engineering Perl Using... Perl.
by planetscape (Chancellor) on Jun 18, 2005 at 06:16 UTC
Re: Reverse Engineering Perl Using... Perl.
by GrandFather (Sage) on Jun 18, 2005 at 05:33 UTC

    I guess part of the process is to put use strict and use warnings in each source file.

    A really neat next trick would be to write some code to take the compiler errors and fix the easy bareword and related issues. This is sort of a version of your Possible Start Point, but using the compiler to do some of the work for you.

    Perl is Huffman encoded by design.
Re: Reverse Engineering Perl Using... Perl.
by Anonymous Monk on Jun 17, 2005 at 16:49 UTC
    Search CPAN for "calltree". I think there's a program of that name that may do what you want...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://467728]
Approved by samizdat
Front-paged by Old_Gray_Bear
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (10)
As of 2017-06-23 18:27 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (554 votes). Check out past polls.