Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??


I'm in charge of document and reverse-engineering a large system composed mainly of unorganized, undocumented, roten and ragged Perl Code, probably written for Perl 5_0005 in a AIX machine.

The History

It was being maintained for the last seven years by folks that know nothing about "good pratices" on software engineering and the code needs refactoring a.s.a.p.

The Mission

My mission, right now, is rise as much as I can from the existing code, and I see a big ammount of information running away to the backyards door if I don't document all the existing subs and its dependencies.

Gory Details

What I already got: I have a nice hack using File::Find that was able to put me on my way by using simple regular expressions to recognize and capture subrotine declarations and qualify them by packages. Code follows:

#!c:\perl\bin\perl.exe use strict; use warnings; use constant CODE => '/path/to/code/rootdir/'; use File::Find; my %source; find( \&wanted, CODE ); ## ## pkg( $filename ) ## Determina o pacote (package) que o arquivo implementa. ## sub pkg{ my $file = shift; my $pack; open FILE, $file or die $1; while( <FILE> ){ if( m{^package\s+([\w:]+)\s*} ){ $pack = $1; last; } } close FILE or die $1; return $pack; } ## ## uses( $filename ) ## Determina as bibliotecas que este módulo|script usa|requer. ## sub uses{ my $file = shift; my %libs; open FILE, $file or die $1; while( <FILE> ){ $libs{ $1 }++ if m/^\s*(?:use|require)\s+(\S+).*?;/; } close FILE or die $!; return wantarray ? ( keys %libs ) : [ keys %libs ]; } ## ## loc( $filename ) ## Conta as linhas de código existentes em um determinado arquivo. ## sub loc{ my( $file, $counter ) = ( shift ); open FILE, $file or die $!; $counter++ while( <FILE> ); close FILE or die $!; return $counter; } ## ## subs( $filename ) ## Determina o nome das subrotinas declaradas em $filename. ## sub subs{ my $file = shift; my @subs; my $package; open FILE, $file or die $1; while( <FILE> ){ $package = $1 if m{^package\s+([\w:]+)\s*}; push @subs, $1 if /^sub\s+([\w:]+)/; } close FILE or die $1; @subs = map { s/$package\:\://o; $_ } @subs if $package; return wantarray? @subs : \@subs; } ## ## version( $filename ) ## Tenta determinar a versão do módulo ou script ## inspecionando a variável $VERSION definida por ele. ## sub version{ my $file = shift; my $version; open FILE, $file or die $1; while( <FILE> ){ if( m/VERSION\s*=\s*/ ){ $version = $_; last; } } close FILE or die $1; { no strict; if( $version ){ eval $version; $version = $VERSION; $version = '0.1' if $version =~ /Revision/; }else{ $version = ''; } } return $@? $@ : $version; } ## ## wanted( $filename ) ## Determina se desejamos ou não contar com este arquivo na nossa aval +iação. ## Chamada como call-back pelo File::Find::find. ## sub wanted{ my $f; my $file = $_; return unless ( -f && /\.(p[lm])$/ ); my $type = ( $1 eq 'pl' ? 'script' : 'module' ); ( $f = $File::Find::name ) =~ s{CODE}{}o; $source{ $f } = { type => $type, name => $file, filename => $f, LoC => loc( $File::Find::name ), 'package' => pkg( $File::Find::name ) || 'main', libs => uses( $File::Find::name ) || [], subs => subs( $File::Find::name ) || [], version => version( $File::Find::name ), }; } my $packages; for my $file ( keys %source ){ push @{$packages->{ $source{$file}->{package}||'main' }}, $source{ +$file}; } # use Data::Dumper; # print Dumper( $packages ); # print Dumper( \%source ); print q{ <html> <head> <style> body { margin: 5% 10% 5% 10%; background-color: white; font-size: 130%; } div.system { background-color: #EEE; padding: 10 20 10 20; } p.title{ font-size: 180%; font-weight: bold; text-align: center; } div.package{ margin: 5 5 0 5; padding: 0 5 5 5; border: solid thin black; } div.file{ margin: 5 10 5 10; padding: 5 5 5 5; background-color: #DDD; } </style> <title>Relat&oacute;rio de An&aacute;lise do C&oacute;digo Fonte</ +title> </head> <body> <div class="system"> <p class="title">Relat&oacute;rio de An&aacute;lise do C&oacut +e;digo Fonte</p> }; my $total_loc = 0; my $total_subs = 0; my $total_packages = scalar keys %$packages; my $total_files = scalar keys %source; map { map { $total_loc += $_->{LoC} } @$_ } values %$packages; map { map { $total_subs += scalar @{$_->{subs}} } @$_ } values %$packa +ges; print qq{<p>Total pacotes: $total_packages.<br>Total Arquivos: $total_ +files.<br>Total linhas: $total_loc.<br>Subrotinas: $total_subs.</p>\n +\n}; foreach my $pack( sort keys %$packages ){ my @files = sort { $a->{name} cmp $b->{name} } @{$packages->{$pack +}}; my $package = $files[0]->{package} || 'main'; print qq{<div class="package"> <a name="#package_${package}">\n<p> +Pacote <code&gt;${package}</code&gt; </p>\n\n}; foreach my $f ( @files ){ $f->{filename} =~ s/\'\"//og; $f->{name} =~ s/\'\"//og; print qq{<div class="file"> <a name="#file_$f->{name}">\n<p>Arquiv +o <code&gt;$f->{filename}</code&gt; </p>\n\n<ul>}; print qq{<li>Tipo: <code&gt;$f->{type}</code&gt;</li>\n}; print qq{<li>Linhas de C&oacute;digo: $f->{LoC}</li>\n}; print qq{<li>Vers&atilde;o: $f->{version}</li>\n}; if( scalar @{$f->{libs}} ){ print qq{<li>Depend&ecirc;ncias:\n<ul>\n}; foreach my $lib( sort @{$f->{libs}} ){ $lib =~ s/\'\"//og; if( exists $$packages{$lib} || grep /\L$lib\E/, map lc, keys % +$packages ){ print qq{<li> <a href="#package_$lib">$lib</a></li>\n}; }else{ print qq{<li>$lib</li>\n}; } } print qq{</ul>\n}; } if( scalar @{$f->{subs}} ){ print qq{<li>Subrotinas Implementadas:\n<ul>\n}; foreach my $sub( sort @{$f->{subs}} ){ print qq{<li><code&gt;$sub()</code&gt;</li>\n}; } print qq{</li>\n}; } print qq{</ul></div> <!-- class file -->\n}; } print qq{</div> <!-- class package -->\n}; }


I know this is not the Best Way To Do It, but I'm sure its working. The output is kind of nicely formatted HTML linking modules dependencies inside my source code.

Next Steps

My next step would be mapping every subrotine call so I can follow calls along the modules and determine real modules dependencies between modules. It seems that the old developer think its easy just cut'n'paste all possible use Module; from the system than decide what modules are really needed for a given module or program.

Looking For

In sintesys, I'm looking for suggestions about what kind of strategy I should use to parse all this Perl code (about 1.85Mb of ascii text) and build the information I need about it in a few hours (this is all I have now, sorry).


My limitations: I have no access to other modules than not the default in ActivePerl 5.6.1 (this is internal politics: I can't use anything not previously checked by security folks, including open-source tools). This is kind of a problem, given that the ActivePerl 5.6.1 lacks the IniFile module, needed to parse the original system config files.

A Possible Start Point

Talking about a start point, maybe I should scan a module file looking for subs, and inside each sub, look for barewords (this will probably lead me to a better chance of having a subrotine call). After that, I need to scan the module for the candidate subrotine name and if I can find it, store this information. If there is no subrotine declaration for the given bareword, I need to scan modules listed on @ISA, in order, for the same thing, recursivelly until I find the sub declaration or run out of modules to scan.


Of course, it seems to me that this could take a big amount of time. Maybe there are better approaches for this problem...


Fixed a little issue about a misplaced <code> tag in the article; Many thanks to jasonk for pointing this.

wazoox just told me that the '<' signs at my code was errouneously translated to '&lt;'. Thank you for that.

In reply to Reverse Engineering Perl Using... Perl. by monsieur_champs

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others having an uproarious good time at the Monastery: (8)
    As of 2018-06-20 07:12 GMT
    Find Nodes?
      Voting Booth?
      Should cpanminus be part of the standard Perl release?

      Results (116 votes). Check out past polls.