I agree with the comments in the previous replies, and would add that there is also
File::Finder, which provides something more like the command-line interface of the common unix "find" utility; like File::Find::Rule, this "amendment" to File::Find makes it a lot easier to come up with working code.
But both the ::Finder and ::Rule extensions are just wrappers around the core File::Find module, and all three end up suffering from the same problem relative to using the basic "find" utility -- they are much slower, and this is the main reason why I hate File::Find and anything based on it.
I'd much rather open a pipeline file handle running the "find" command: this utility is either native or freely available for all common OS's, it's pretty easy to use in a perl script via the file handle idiom, and it runs a lot faster -- typically a by factor of six in wallclock time.
I posted a benchmark on File::Find four years ago, and another on File::Finder two years ago, so here's a new one for File::Find::Rule (using an example from the module's man page). All of these benchmarks show pretty much the same timing difference between the module and the system "find" utility.
#!/usr/bin/perl
use strict;
use Benchmark;
use File::Find::Rule;
( @ARGV == 1 and -d $ARGV[0] )
or die "Usage: $0 some/path\n";
print "started at ", scalar localtime, $/;
timethese( 10, {
'Shell-find pipe' => \&try_pipe,
'file::Find::Rule' => \&try_ffr,
});
sub try_ffr {
my @f = File::Find::Rule->file()->name( '*.pm' )->in( $ARGV[0] );
print scalar @f, " .pm files found under $ARGV[0] at ", scalar loc
+altime, $/;
}
sub try_pipe {
open( FIND, "find $ARGV[0] -name '*.pm' |" );
my @f = <FIND>;
print scalar @f, " .pm files found under $ARGV[0] at ", scalar loc
+altime, $/;
}
__END__
# sample run:
$ ffr-bm.pl /usr
started at Sun Jul 16 23:48:33 2006
Benchmark: timing 10 iterations of Shell-find pipe, file::Find::Rule..
+.
481 .pm files found under /usr at Sun Jul 16 23:48:41 2006
481 .pm files found under /usr at Sun Jul 16 23:48:44 2006
481 .pm files found under /usr at Sun Jul 16 23:48:46 2006
481 .pm files found under /usr at Sun Jul 16 23:48:48 2006
481 .pm files found under /usr at Sun Jul 16 23:48:50 2006
481 .pm files found under /usr at Sun Jul 16 23:48:52 2006
481 .pm files found under /usr at Sun Jul 16 23:48:53 2006
481 .pm files found under /usr at Sun Jul 16 23:48:55 2006
481 .pm files found under /usr at Sun Jul 16 23:48:57 2006
481 .pm files found under /usr at Sun Jul 16 23:48:59 2006
Shell-find pipe: 26 wallclock secs ( 0.03 usr 0.04 sys + 8.73 cusr
+5.79 csys = 14.59 CPU) @ 142.86/s (n=10)
481 .pm files found under /usr at Sun Jul 16 23:49:19 2006
481 .pm files found under /usr at Sun Jul 16 23:49:41 2006
481 .pm files found under /usr at Sun Jul 16 23:49:59 2006
481 .pm files found under /usr at Sun Jul 16 23:50:14 2006
481 .pm files found under /usr at Sun Jul 16 23:50:29 2006
481 .pm files found under /usr at Sun Jul 16 23:50:44 2006
481 .pm files found under /usr at Sun Jul 16 23:51:02 2006
481 .pm files found under /usr at Sun Jul 16 23:51:24 2006
481 .pm files found under /usr at Sun Jul 16 23:51:42 2006
481 .pm files found under /usr at Sun Jul 16 23:51:57 2006
file::Find::Rule: 178 wallclock secs (33.39 usr + 53.05 sys = 86.44 CP
+U) @ 0.12/s (n=10)
The output shows that the OS's own caching behavior gives an "unfair advantage" to F::F::R -- the "shell-find pipe" approach took 7 sec on its first iteration, and less than 3 sec on each of the remaining nine iterations. But even with the OS caching already done, F::F::R still takes between 15 and 22 sec per iteration, and puts a much heavier load on the cpu. (This is with perl, v5.8.6 built for darwin-thread-multi-2level on macosx 10.4.7; I've seen similar results on freebsd and linux.)
If you aren't doing any really big directory trees, and/or you don't care how long it takes, using some version of File::Find is "good enough", but for serious work on a really large directory tree, it's worthwhile to take advantage of the perl's value as a "glue" language (to make efficient use of existing system resources), rather than taking advantage of these particular modules.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.