Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Find installed Perl modules matching a regular expression

by toolic (Bishop)
on Sep 15, 2009 at 16:01 UTC ( #795418=CUFP: print w/ replies, xml ) Need Help??

Here is a handy command-line tool to quickly view installed Perl modules whose name matches a specified regular expression.

Features

  • Perl regular expression syntax, with separate case-sensitive switch.
  • Optional initialization file for faster look-ups.
  • Option to print the module name or the full directory path to the module file.
  • Option to display duplicate modules and other statistics.
  • Uses only core modules.

Other well-known methods

So... why another way to do it?

Simply put: I could not easily convince these other tools to Do What I Want, as quickly as I want, in (what I consider) a bug-free manner. Obviously, this is not a new idea; it is merely a different implementation. There are many threads here at the Monastery, as a Super Search would reveal, and I have probably read every node of every thread on the topic. I believe HTML::Perlinfo does everything this script does (and much, much more), except that I could not easily figure out how to generate output as simple text, rather than HTML. I consider HTML::Perlinfo to be a valuable companion to this script. I run a daily cronjob to dump out recent versions of both HTML and text.

Impatience

In my opinion, the biggest advantage here is the fast look-up capability. No matter how you slice it, you have to search through the @INC directories via some variant of find, which can take a whole minute or so -- I just do not have the patience to wait that long! Maintaining the initialization file avoids that nonsense.

The code

use warnings; use strict; use Getopt::Long; use Pod::Usage; use File::Find; my $print_path; my $report; my $re; parse_args(); # Clean up @INC my @dirs; for my $dirname (@INC) { if (-d $dirname) { next if $dirname eq '.'; $dirname =~ s{/+}{/}g; $dirname =~ s{/$}{}; push @dirs, $dirname; } } @dirs = uniq(@dirs); # For quicker operation, use init file, if it exists my @files; my $use_find = 1; my $message; my $init_file = exists $ENV{HOME} ? "$ENV{HOME}/.findpm" : ''; if (-e $init_file) { if (open my $fh, '<', $init_file) { @files = <$fh>; close $fh; chomp @files; my $days = 1; if (-M $init_file > $days) { $message = "Warning: $init_file is older than $days day\n" +; } die "Error: $init_file is empty" if -z $init_file; $use_find = 0; } else { $message = "Warning: $init_file exists, but can not be opened: + $!"; } } # Otherwise, use the slower find command if ($use_find) { # Find all .pm files under @INC dirs my @find_dirs = reduce_dirs(@dirs); find( { wanted => sub { push @files, $_ if -f $_ and /\.pm$/ }, no_chdir => 1, }, @find_dirs ); @files = uniq(@files); } # Print those modules/files which match the regex my %mods; for my $file (@files) { my @ds; for my $dir (@dirs) { if (index($file, $dir) == 0) { #print "$d2 is a substring of $d1, starting at pos 0\n" push @ds, $dir; } } my $d = (sort {length($b) <=> length($a)} @ds)[0]; my $rel = substr($file, (length($d)+1)); my $name = $rel; $name =~ s/\.pm$//; next unless $name =~ /$re/; push @{ $mods{$rel} }, $d; if ($print_path) { print "$file\n"; } else { $rel =~ s/\.pm$//; $rel =~ s{/}{::}g; print "$rel\n"; } } if ($report) { my $num_dups = 0; for (keys %mods) { $num_dups++ if (scalar(@{$mods{$_}}) > 1); } if ($num_dups) { print "\nDUPLICATES\n"; for my $rel (keys %mods) { if (scalar(@{$mods{$rel}}) > 1) { print "$rel\n"; for my $dir (@{$mods{$rel}}) { print " $dir/$rel\n"; } } } } print "\nSUMMARY\n"; print " regex = $re\n"; print " Used '$init_file' init file instead of 'find'\n" unless + $use_find; print " INC dirs:\n"; print " $_\n" for @dirs; print ' Total ".pm" files = ', scalar @files, "\n" +; print ' Matching unique ".pm" files = ', scalar keys %mods, +"\n"; print ' Matching duplicate ".pm" files = ', $num_dups, "\n"; } warn $message if $message; exit; sub reduce_dirs { # Reduce a list of directory names by eliminating # names which contain other names. For example, # if the input array contains (/a/b/c/d /a/b/c /a/b), # return an array containing (/a/b). my @dirs = @_; my %substring_count = map { $_ => 0 } @dirs; for my $x (@dirs) { for my $y (@dirs) { next if $x eq $y; if (index($x, $y) == 0) { # if y is substring of x, starting at position 0 $substring_count{$x}++; } } } my @dsubs; for (keys %substring_count) { push @dsubs, $_ if $substring_count{$_} == 0; } return @dsubs; } sub uniq { # From List::MoreUtils, $VERSION = '0.22' my %h; map { $h{$_}++ == 0 ? $_ : () } @_; } sub parse_args { my ($help, $sens); GetOptions( 'sens' => \$sens, 'path' => \$print_path, 'report' => \$report, 'help' => \$help ) or pod2usage(); $help and pod2usage(-verbose => 2); my $pat = (@ARGV) ? shift @ARGV : '.'; $pat =~ s{::}{/}g; $re = ($sens) ? qr/$pat/ : qr/$pat/i; #print "pat=$pat\n"; #print "re=$re\n";#exit; @ARGV and pod2usage("Error: unexpected args: @ARGV"); } =head1 NAME B<findpm> - Find installed Perl modules =head1 SYNOPSIS findpm [options] [regex] Options: -help verbose help -path print out full directory paths also -report print out detailed report -sens case-sensitive [default is case-insensitive] =head1 DESCRIPTION Search through the directories in the Perl C<@INC> variable for Perl module files (all files with a C<.pm> extension) matching a specified regular expression. The names of all the modules which match will be printed to STDOUT. Any directories listed in C<@INC> which do not exist will be silently +ignored. Excludes the current directory (.). If you are impatient (like I am) you can optionally use an initializat +ion file instead of letting the script search through all the C<@INC> directories every time you run the script. The file must be in your h +ome directory and must be named C<.findpm>. You must create this file you +rself (see EXAMPLES below), and you should keep it up to date. Since you wi +ll get a warning if the init file is more than a day old, I recommend creating the file using a cron job that runs once a day. If the init +file does not exist, the script will proceed to search C<@INC>. =head1 ARGUMENTS =over 4 =item regex An optional regular expression may be given. The regex may be a simpl +e string, such as C<foo>, or it may be a more complicated expression, su +ch as C<^foo.*bar\d>. The regex syntax is Perl; it should not be confused with shell wilcard syntax or the syntax for other common Unix utilitie +s, such as I<sed> or I<grep>. It is best to quote the regex to prevent interaction with the shell. Do not include the C<.pm> extension as par +t of the regex. If no regex is given, find all modules. =back =head1 OPTIONS All options can be abbreviated. =over 4 =item sens By default, the regular expression is case-insensitive. So, if the inp +ut regex is C<foo>, it will match C<foo> as well as C<FOO> and C<Foo>, et +c. To use case-sensitive, use the C<-sens> option. findpm -sens foo =item path By default, only the module name is printed. To instead print the full directory path to the module file, use the C<-path> option. findpm -path foo =item report To print out additional statistics, use the C<-report> option. This will show the total number of matching modules, duplicate modules +, etc. findpm -report =item help Show verbose usage information. =back =head1 EXAMPLES Find xml modules: findpm xml Find modules with case-sensitive "Ext": findpm -sens Ext Find modules like File::Find. The following are equivalent because C<::> will be converted to C</> (similar to I<perldoc>): findpm 'file::find' findpm 'file/find' Find all modules in all C<@INC> directories: findpm Create init file: rm -f ~/.findpm; findpm -path > /tmp/.findpm; mv /tmp/.findpm ~/.f +indpm =head1 CONFIGURATION AND ENVIRONMENT Searches for an optional initialization file in the directory specifie +d by the C<HOME> environment variable: ${HOME}/.findpm =head1 LIMITATIONS The initialization file is only supported for Unix-type operating syst +ems. =cut

Constructive criticism, suggestions for improvements and bug reports are welcome.

Update: Now only uses core modules.
Update: Avoid potential warning; small change to POD.
Update: find is more portable.

Comment on Find installed Perl modules matching a regular expression
Select or Download Code
Re: Find installed Perl modules matching a regular expression
by Anonymous Monk on Sep 16, 2009 at 12:06 UTC
    To get rid of the unix limitation you could die without $ENV{HOME}, or use File::HomeDir.

    Here is my caching version, works on ALL systems :)

    echo pml is module names list pminst >pml echo pmlf is module filenames list pminst -l >pmlf echo pmlfl is name tabspace filename paste pml pmlf >pmlfl grep "^CGI::S[^:]*$" pml grep "CGI/S[^/]*$" pmlf grep -P "^CGI::S\w+$" pml grep -P "^CGI::S\w+\t" pmlfl perl -lne "print $_ if /^CGI(::\w+)$/" pml perl -lne "print $_ if m!CGI/S[^/]*$!" pmlf perl -lne "print $_ if /^CGI(::\w+)\t/" pmlfl
    update: Whoops, I just realized pminst is broken in 2 ways, This doesn't match
    D:\>pminst Wx$
    And this matches prints MSWin32-x86-multi-thread
    D:\>pminst Wx.pm$ Wx MSWin32-x86-multi-thread::Wx D:\>perl -le"print for @INC" C:/perl/5.10.1/lib/MSWin32-x86-multi-thread C:/perl/5.10.1/lib C:/perl/site/5.10.1/lib/MSWin32-x86-multi-thread C:/perl/site/5.10.1/lib .
      I appreciate the feedback ++
      To get rid of the unix limitation you could die without $ENV{HOME}, or use File::HomeDir.
      You are correct: the reason for my self-imposed unix limitation is that I was unaware of how to handle $ENV{HOME} in a portable way. Thanks for bringing the File::HomeDir module to my attention. For my purposes, I have come to realize that it is important to only use core modules in this script. The original version of the script used the non-core List::MoreUtils. I ran into problems on one system configuration here @work which, unbelievably, did not have it installed. So I could not even analyze what modules were installed because my script died because it could not use a module!

      I will take a look at the File::HomeDir source code to see if I can incorporate its techniques for making findpm portable.

      Whoops, I just realized pminst is broken in 2 ways
      I am also aware of 2 bugs in pminst:
      1. It completely misses some modules.
      2. It unnecessarily duplicates some modules in its output. I believe this is the same as the MSWin32-x86-multi-thread issue you mentioned. It does not seem to handle all of the @INC paths gracefully. At first, I was willing to concede that my sysadmins set @INC in an unconventional manner... until you mentioned that it was also an issue for your system.

      I should file a bug report on CPAN. Unfortunately, it is not obvious to me how to patch the code. I guess this is the reason I created the findpm script in the first place.

      Update: Someone has reported a bug: https://rt.cpan.org/Public/Bug/Display.html?id=50644

Re: Find installed Perl modules matching a regular expression
by toolic (Bishop) on Sep 22, 2009 at 20:55 UTC
    I have updated the code to be more portable to other operating systems. The restriction now is that the initialization file is only supported for Unix-type operating systems. But, even that could be fixed by changing a single line in the source code.
      Here is another idea, I would replace
      if (-M $init_file > $days) { $message = "Warning: $init_file is older than $days day\n"
      with a check to see if $init_file is more recently modified than a directory in @INC, like
      for my $init_file ( '.', '..' ) { my $mod = ( stat $init_file )[9]; if ( my @mod = grep { ( stat $_ )[9] > $mod } @INC ) { warn "Warning: $init_file is older than (", join( ' , ', @mod +), ") "; } } __END__ Warning: .. is older than (C:/perl/5.10.1/lib/MSWin32-x86-multi-thread + , C:/perl/5.10.1/lib , C:/perl/site/5.10.1/lib/MSWin32-x86-multi-thr +ead , C:/perl/site/5.10.1/lib , .) at - line 4.
        Nice idea. However, it does not seem to work if a sub-directory of a directory in @INC has been modified. Since directories under @INC can be arbitrarily deep, it would be necessary to perform a find on all directories, which is what the init file was designed to avoid.

        Perhaps there is a more efficient way to check if any directory has been modified throughout a tree.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://795418]
Approved by zwon
Front-paged by zwon
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (5)
As of 2015-07-03 23:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (57 votes), past polls