Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

exports -- which module exports are used?

by tye (Cardinal)
on Sep 16, 2013 at 16:26 UTC ( #1054312=CUFP: print w/ replies, xml ) Need Help??

use POSIX;

I hate that line of code. It imports over 500 symbols, the vast majority of which surely aren't being used. But the real crime is the bad documentation it provides.

use POSIX qw< ceil floor >;

That is a reasonable line of Perl code. Now, when somebody is reading the file of code that contains it and runs across "floor( ... )", they don't have to rely on having memorized even a tiny fraction of the hundreds of possible exports from POSIX.pm in order to figure out that "perldoc POSIX" will tell them what floor() does.

And, if they find that 'floor' and 'ceil' (and 'POSIX') are not mentioned anywhere else in the code, then they can remove the whole "use POSIX" line. That can be important information when refactoring code.

So, when I run into "use POSIX;" in code, what do I do about it? I want to replace it with a line that makes the exports explicit. But searching through several hundred lines of code for usages of any of hundreds of symbols is beyond my abilities. So I wrote a simple Perl script:

> exports Usage: exports [-a] [ Perl::Module [...] ] [ file [...] ] Writes out what each listed module by-default exports or reports all uses of those exports in the listed files. If no module names are listed, then searches each file for cases of 'use Perl::Module;' and suggests replacements. -a: Searches for *any* exports, not just default ones.
> grep POSIX ASAP/Client.pm ASAP/Client.pm:use POSIX;
> exports POSIX ASAP/Client.pm ASAP/Client.pm: 107: strftime("%Y-%m-%dT%T$fs Z", gmtime($sec)); strftime # use POSIX qw< strftime >;

Now I can replace that horrible line of code with the suggested reasonable line of code!

Of course, POSIX.pm is not the only module that has default exports. The real useful mode for exports is to just give it a list of file names:

> exports bin/mktestcalls bin/mktestcalls: 1197: openlog( 'mktestcalls', 'pid', 'local3' ); openlog 1204: GetOptions(\%opt, GetOptions 1237: -H pretend to be hostname hostname 1293: my $HOSTNAME = $opt{H} || hostname(); hostname 1590: out_server => hostname(), hostname 2229: eval{ syslog( 'debug', $msg ) }; syslog # use Asterisk::AGI(); # No default exports # use Socket(); # Not used? # use Sys::Hostname qw< hostname >; # use Sys::Syslog qw< openlog syslog >; # use Getopt::Long qw< GetOptions >;

Which gives me nice replacements for much of:

> grep use bin/mktestcalls use strict; use Asterisk::AGI; use List::Util 'shuffle'; use Socket; use Sys::Hostname; use Sys::Syslog; use Sys::SigAction 'set_sig_handler'; use Getopt::Long;

If you have made changes to some code and now you aren't sure if the "use List::Util qw< shuffle max >;" line is still accurate, then you just use exports' "-a" option:

> exports -a List::Util lib/Track.pm lib/Track.pm: 1768: push @dial_servers, shuffle(@servers); shuffle 1820: $maxto = max( max 1821: min( min 1867: # S() = max call duration in seconds max # use List::Util qw< max min shuffle >;

There are also several other ways to use this script. You can just give it a list of module names and it will tell you what they export by default:

> exports File::Basename File::Glob File::Basename 2.78: fileparse fileparse_set_fstype basename dirname File::Glob 1.07:

Or what they can export explicitly:

> exports -a File::Basename File::Glob File::Basename 2.78: fileparse fileparse_set_fstype basename dirname File::Glob 1.07: csh_glob bsd_glob glob GLOB_ABEND GLOB_ALPHASORT GLOB_ALTDIRFUNC GLOB_BRACE GLOB_CSH GLOB_ERR GLOB_ERROR GLOB_LIMIT GLOB_MARK GLOB_NOCASE GLOB_NOCHECK GLOB_NOMAGIC GLOB_NOSORT GLOB_NOSPACE GLOB_QUOTE GLOB_TILDE

Consider one more short exmaple:

> exports POSIX exports exports: 121: or die "Can't rewind handle to $file: $!\n"; rewind # use POSIX qw< rewind >;

I chose this example to point out of couple of things. One is that exports makes no attempt to parse Perl code when looking for uses of exports and so will point out places where you are just using the same "word" as some export from one of the chosen modules, even if that "word" is in a quoted string and so can't be a use of that export. I could use something that has a good reputation for being successful at trying to parse Perl code, but I find that the false positives are quite few in most cases and I prefer them to the potential for false negatives (which are much more serious) in the case of the rare parsing failure. It can also often just be useful to find mentions of exports in strings or comments (sometimes I found outdated logging and outdated comments).

The other thing I wanted to point out is less important. It is how exports decides if what you mentioned is the name of a module or is the name of a file of Perl code. It always just gets it right, IME.

If you had a file named simply "POSIX" in your current directory when you ran the above command, then the "POSIX" command-line argument would not be interpreted as a module name. Conversely, having a file named "my::script" won't prevent exports from trying to do "require my::script;" if you run the command "exports my::script".

That is, a string of more than one \w+ separated by "::" is always assumed to be a module name. If an argument contains \W characters, then it is always assumed to be a file name. Otherwise (for /^\w+$/), exports checks for the existence of a file having that name and if one is not found, it treats it as a module name.

'-' causes STDIN to be read. Also, all module names must come before any file names.

Lastly, it gives a very nice, short example of the output that you get. You get the file name followed by each line of code from that file where an export is mentioned. Each of those lines is underscored (with a repeat of any exports) to highlight each export. Then you get comment lines showing how you should probably import from each module. This pattern of output repeats for each file.

Here's an example that shows that only exports are dealt with:

> exports say say: # use Encode(); # Not used?

The suggestion to change "use Encode;" to "use Encode();" is accurate. However, the "# Not used?" is only accurate in that it means that no exports from Encode.pm are used. The module itself is being used:

> grep Encode say use Encode; } elsif( Encode::is_utf8($_) ) {

Here is the full code for exports:

#!/usr/bin/perl -w use strict; my $Any = 0; # If -a was given. Main( @ARGV ); exit; sub Usage { warn @_, $/ if @_; die "Usage: exports [-a] [ Perl::Module [...] ] [ file [...] ]\n", " Writes out what each listed module by-default exports\n", " or reports all uses of those exports in the listed files. +\n", " If no module names are listed, then searches each file fo +r\n", " cases of 'use Perl::Module;' and suggests replacements.\n +", " -a: Searches for *any* exports, not just default ones.\n" +, ; } sub IsModName { local( $_ ) = @_; return 2 # Looks like 'Foo::Bar'; assume module name. if /::/ && ! /[^\w:]/; return 1 # Just \w chars and not a file; perhaps a module name. if ! /\W/ && ! -e; # Contains a non-module character (like '.') or is a file; assume +file name: return 0; } sub ParseArgs { my( $mods_av, $files_av, @args ) = @_; Usage() if ! @args; while( @args ) { last if $args[0] !~ /^--?[^-]/; local $_ = shift @args; if( /^-a/ ) { $Any = 1; } else { Usage( "Unrecognized option: $_" ); } } shift @args if '--' eq $args[0]; while( @args ) { last if ! IsModName( $args[0] ); push @$mods_av, shift @args; } while( @args ) { my $isMod = IsModName( $args[0] ); die sprintf "Put all module names (%s) before all file names ( +%s)\n", $args[0], $files_av->[-1] if 2 == $isMod; if( '-' ne $args[0] ) { my $isFile = -e $args[0]; die "Can't find file ($args[0]): $!\n" if ! defined $isFile; die "Not a file: $args[0]\n" if ! $isFile || -d _; } push @$files_av, shift @args; } } # Returns the list of symbols exported by the given module: sub GetExports { my( $package ) = @_; eval { $package->import() }; # POSIX doesn't populate @EXPORT e +arly my @exports = do { no strict 'refs'; @{ "${package}::EXPORT" } }; if( $Any ) { no strict 'refs'; push @exports, @{ "${package}::EXPORT_OK" }; } s/^&// for @exports; # '&foo' and 'foo' are the same to Exporter.pm my %seen; @exports = grep ! $seen{$_}++, @exports; # Remove duplicates return @exports; } sub PrintExports { my( $mod ) = @_; my @exports = GetExports( $mod ); my $pref = ''; # if( -t STDOUT ) { my $version = $mod->VERSION(); if( $version ) { print "$mod $version:\n"; } else { print "$mod:\n"; } $pref = ' '; # } print "$pref$_\n" for @exports; } sub SearchFile { my( $file, @mods ) = @_; my $fh; if( '-' eq $file ) { $fh = \*STDIN; } else { open $fh, '<', $file or die "Can't read $file: $!\n"; } @mods = LoadModules( FindUsedModules( $fh ) ) if ! @mods; if( ! @mods ) { my $default = $Any ? '' : ' default'; print "No$default imports: $file\n"; return; } $. = 0; seek $fh, 0, 0 or die "Can't rewind handle to $file: $!\n"; print "$file:\n"; ReportExportUse( $fh, @mods ); } sub MatchWords { my( @exports ) = @_; my @res; for( @exports ) { if( s/^\$// ) { push @res, '\$' . "\Q$_" . '(?![\[\{\w])'; } elsif( s/^\%// ) { push @res, '%' . "\Q$_" . '\b'; push @res, '\$' . "\Q$_" . '\{'; } elsif( s/^@// ) { push @res, '\@' . "\Q$_" . '\b'; push @res, '\$' . "\Q$_" . '\['; } else { push @res, '(?<![\$\@%\w])' . "\Q$_" . '(?!\w)'; } } return join '|', @res; } sub ReportExportUse { my( $fh, @mods ) = @_; my( @exports, %export_mod, %conflict ); GroupExports( \( @exports, %export_mod, %conflict ), @mods ); my %mod_export; my $inuse = 0; if( @exports ) { my $match = MatchWords( @exports ); $match = qr/$match/; local $_; while( <$fh> ) { my $underline = ''; my $line = $_; if( $inuse ) { next if ! s/^([^;]*;)/ ' ' x length($1) /e; $inuse = 0; } elsif( $Any ) { $inuse = 1 if s/^(\s*use\s+[\w:]+[^;]*(;?))/ ' ' x length($1 +) /e && ! $2; } while( /$match/g ) { my( $start, $end ) = ( $-[0], $+[0] ); my $export = substr( $_, $start, $end - $start ); s/\$(.*)\[$/\@$1/, s/\$(.*)\{$/\%$1/, for $export; my $len = length($export); $underline .= ' ' x ( $start - length($underline) ); $underline .= $export; my $mod = $export_mod{$export}; if( $mod ) { $mod_export{$mod}{$export}++; } else { warn "Can't find module that exports '$export'\n"; } } printf "%6d: %s%8s%s\n", $., $line, '', $underline if $underline; } } for my $mod ( @mods ) { my @used = sort keys %{ $mod_export{$mod} }; if( @used ) { Print( "# use $mod\tqw< @used >;\n" ); } elsif( $export_mod{''}{$mod} ) { my $default = $Any ? '' : ' default'; Print( "# use $mod();\t# No$default exports\n" ); } else { Print( "# use $mod();\t# Not used?\n" ); } my $hv = $conflict{$mod}; for my $prev ( keys %$hv ) { my @e = sort grep { $mod_export{$prev}{$_} } keys %{ $hv->{$prev} }; print "# Also (see $prev): @e\n" if @e; } } } # Expands tab characters ("\t"s) then prints: sub Print { my @strings = @_; my $pos = 0; for( @strings ) { my $plus = 0; s{\t}{ my $total = $pos + $plus + pos() - 1; my $pad = 9 - $total % 8; $pad += 8 if $total < 16; $pad += 8 if $total < 8; $plus += $pad - 1; ' ' x $pad }gex; $pos += length; } print @strings; } # Note duplicate exports and assign each export to only one module: sub GroupExports { my( $exports_av, $export_mod_hv, $conflict_hv, @mods ) = @_; for my $mod ( @mods ) { my @e = GetExports( $mod ); if( ! @e ) { $export_mod_hv->{''}{$mod} = 1; next; } for my $export ( @e ) { my $prev = $export_mod_hv->{$export}; if( $prev ) { $conflict_hv->{$mod}{$prev}{$export} = 1; } else { push @$exports_av, $export; $export_mod_hv->{$export} = $mod; } } } } # Find used modules, either all or just those with no arguments given: sub FindUsedModules { my( $fh ) = @_; my @mods; local $_; while( <$fh> ) { if( /^\s*use\s+([\w:]+)\s*;/ || $Any && /^\s*use\s+([\w:]+)\b/ ) { push @mods, $1 if 'strict' ne $1; } } return @mods; } # Returns names of modules successfully loaded ("require"d): sub LoadModules { return grep { ( my $file = $_ ) =~ s-::-/-g; $file .= ".pm"; if( ! eval { local $_; require $file; 1 } ) { # ... trim error message ... warn "$_: $@\n"; 0 # Ignore further work for this module } else { 1 # Keep this module for further work } } @_; } sub Main { my( @args ) = @_; ParseArgs( \my( @mods, @files ), @args ); exit 1 if @mods != LoadModules( @mods ); # If some modules not foun +d. if( ! @files ) { # Just list each module and its exports: PrintExports( $_ ) for @mods; } else { # Search file(s) for uses of exports: for my $file ( @files ) { SearchFile( $file, @mods ); } } }

- tye        

Comment on exports -- which module exports are used?
Select or Download Code
Re: exports -- which module exports are used?
by toolic (Chancellor) on Sep 16, 2013 at 18:13 UTC
    This is very useful. I ran it on a bunch of my scripts, and the results look great.

    One of my scripts uses XML::Tidy, and gave me an unexpected result:

    # use XML::Tidy(); # Not used?

    So, I did some more poking at it:

    $ perl -MXML::Tidy -le 'print $XML::Tidy::VERSION' 1.12.B55J2qn $ exports XML::Tidy Invalid version format (non-numeric data) at exports line ...

    It points to the my $version = $mod->VERSION(); line. I'm not sure if there is anything that can be done about modules that use icky versions.

      I assumed $mod->VERSION() was well behaved and hoped it would handle some weird new practices that don't appear to set $Module::Name::VERSION. But it not coping doesn't really surprise me. I'll use it and detect failure and fall back to just copying the $VERSION global.

      Though I don't see how this problem could be to blame for your "Not used?" result. As I mentioned above, that line only means "No exports used". If your code doesn't mention any of the *_NODE constants(?) that XML::Tidy exports, then that is the expected / desired result (and the recommended empty parens in "use XML::Tidy();" are recommended to document that no exports are being used).

      - tye        

        I just use the canned bin/xmltidy script that comes packaged with the module, and it does not use any of those constants. Thanks for the clarification.
Re: exports -- which module exports are used?
by QM (Vicar) on Sep 17, 2013 at 08:48 UTC
    Thanks tye, this is really great.

    Now, just thinking about the next step...

    I noticed in your examples that it sometimes matches export identifiers that are in comments, e.g.,

    1867: # S() = max call duration in seconds max

    In this case it's not even valid code. I'm wondering if there's a more robust approach, perhaps using B::Deparse or the like? Granted, that will probably be an order of magnitude more work, and it would really deserve a CPAN home. (And I'm probably not the one to do it =)

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      I mentioned this in my write-up. So far I actually like seeing the false positives (except sometimes for POSIX.pm because it exports such a huge number of simple English words) because sometimes they call out out-of-date comments, out-of-date logging, or commented-out code. Even in the case of POSIX, the overhead of stripping out the false positives by hand is little work, IME.

      I could use PPI or a compiler back-end (such as B::Xref) or an op-tree inspector. But not only would that skip the so-called "false positives" that I'm interested in (as well as the annoying real false positives), but it would also miss uses inside of simple eval $string constructs.

      But I am considering using a compiler back-end to separate out the unambiguous uses that can be summarized more succinctly and keep the current output format for only the "likely false positives". B::Xref sounds almost tailor made for this type of application.

      However, after running B::Xref on a couple of example modules and scripts, I found that not only did its output contain a lot of information that is not applicable (which wasn't a surprise), but I found a shocking 0% of the desired information in the output. I also found quite a bit of information that looked simply invalid.

      Now having looked at B::Xref more extensively, I suspect that I just couldn't find the relevant information amid the huge volume of uninteresting or bizarre information (but I also didn't do the deeper look on as many examples). Unfortunately, giving it the "-d" option not only eliminated the information it is documented to remove (none of which I was interested in), but reduced the output to a tiny amount of information, excluding everything applicable to this problem. This leaves me a bit wary of the soundness of the module.

      So picking a single symbol so that I could filter out the irrelevant information, I was able to look closely and it seemed to find all of the uses of that symbol. Unfortunately, the line numbers are only accurate to the "statement" level, which makes lining up these with the string matches I already find significantly more challenging. But if I only show the "false positives" for symbols that have no known positives, then that is probably workable.

      I also hope to evaluate PPI in relation to this. I have other nefarious uses I hope to execute via PPI, after all.

      However, the work required is significant enough and the potential gain minor enough, that I won't be surprised if I never get around to producing any useful improvements. I only have so many files of Perl code to run this against and the major benefit is from the first run. I never had a need for this tool for my own code.

      But I do plan to put this on github which will make it easy for others to cooperate in extending it however they see fit.

      - tye        

Re: exports -- which module exports are used?
by goldenblue (Initiate) on Oct 02, 2013 at 16:14 UTC
    this is really awesome! Soo much cleanup to do...
    Thank you!

    -- the singularity will happen.
Re: exports -- which module exports are used?
by toolic (Chancellor) on Oct 03, 2013 at 14:41 UTC
    Minor issue: I think the line numbers that are reported are incorrect:
    $ cat foo.pl #!/usr/bin/env perl use Carp; carp('boo'); # This is line 3 $ exports foo.pl foo.pl: 7: carp('boo'); # This is line 3 carp # use Carp qw< carp >; $

    Am I reading that right? It looks like it tells me that the carp call is on line 7, but it is really on line 3. If I make this hack to exports, it seems to give me the line number I expect:

    sub ReportExportUse { my( $fh, @mods ) = @_; my( @exports, %export_mod, %conflict ); GroupExports( \( @exports, %export_mod, %conflict ), @mods ); my %mod_export; my $inuse = 0; if( @exports ) { my $match = MatchWords( @exports ); $match = qr/$match/; local $_; local $. = 0; # <------------- HACK while( <$fh> ) {

    Here is my new output:

    foo.pl: 3: carp('boo'); # This is line 3 carp # use Carp qw< carp >;

    There is probably a better solution, and I suspect it has something to do with the seek.

      There is probably a better solution, and I suspect it has something to do with the seek.

      Indeed. This is how I've fixed it:

      $. = 0; seek $fh, 0, 0 or die "Can't rewind handle to $file: $!\n";

      (Updated: Actually, you also have to move that code so that it gets run before 'print' (for example) gets called, which is required else $. would be tied to STDOUT.)

      Alternately, you can do:

      { my $eof = <$fh>; $. = 0; } seek...

      Or I could just open the file twice.

      - tye        

        Thanks for the patch.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://1054312]
Approved by davido
Front-paged by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (11)
As of 2014-07-28 19:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (207 votes), past polls