http://www.perlmonks.org?node_id=1050875

wanna_code_perl has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

With B::Concise, it's possible to get a syntax tree for specific subroutines, CODE refs, or the top "main" program. However, what I need is the syntax tree (or trees) for an entire module. I.e., I need some kind of dump of Perl's parse of a module, which might include "dead" code that is never called. I would also want to recursively do the same thing for any other modules/sources pulled in with use or require (I'd probably ignore core modules and a few others, but haven't decided yet).

This will be part of a sort of static analysis tool I'm working on. This therefore imposes the constraint that the target module is essentially unknown code, and should be considered 'read-only' in the sense that I can't require the developer to add instrumentation to the code itself.

Also, requiring a recompile of perl is not an option since this code will be widely distributed. So, unfortunately all perl -D flags are a no-go.

One rather ugly hack I explored briefly was to try to extract all sub names and use/require modules from the source and run B::Concise on each result, but, beyond the fact it's a terrible idea, there's no way that I know of to run B::Concise on anon subs without their compiled CODE ref handy, which I wouldn't, and couldn't, in general, compile. (Whenever sub { ... } shows up in the code, the entire sub just shows up as a single anoncode line.)

Is there a way to do what I'm describing? Efficiency is not high on my list of priorities.

Replies are listed 'Best First'.
Re: Optree for entire module
by Corion (Patriarch) on Aug 26, 2013 at 07:34 UTC

    In this area of introspection, Perl is quite good. Each package has a hash that contains all global names ("globs"). If you iterate over the code slots of these globs, you find all code that is connected to a name in that package.

    This approach will not find stuff that has been declared lexically, you can't get at it in a convenient way. For that, you have to look at PadWalker, or hit the module author until they make globally accessible what in fact is a variable with (module) global scope.

    #!perl -wl use strict; use File::Basename; print "Subroutines in File::Basename"; print $_ for keys %File::Basename::;

    If you want to make this more parametric, the easiest approach is to switch off strict for the section where you go through the namespace:

    use strict; use File::Basename; sub dump_keys { my($package)= @_; print "Subroutines in $package"; no strict 'refs'; print $_ for keys %{"$package\::"}; } dump_keys('File::Basename');

    I think you can also work your way down the namespace hierarchy by starting at the %:: hash and going to the File:: entry and then to the Basename:: entry, but I find this too much hassle compared to switching off strict for a small block.