comment on

I agree with the comments in the previous replies, and would add that there is also File::Finder, which provides something more like the command-line interface of the common unix "find" utility; like File::Find::Rule, this "amendment" to File::Find makes it a lot easier to come up with working code.

But both the ::Finder and ::Rule extensions are just wrappers around the core File::Find module, and all three end up suffering from the same problem relative to using the basic "find" utility -- they are much slower, and this is the main reason why I hate File::Find and anything based on it.

I'd much rather open a pipeline file handle running the "find" command: this utility is either native or freely available for all common OS's, it's pretty easy to use in a perl script via the file handle idiom, and it runs a lot faster -- typically a by factor of six in wallclock time.

I posted a benchmark on File::Find four years ago, and another on File::Finder two years ago, so here's a new one for File::Find::Rule (using an example from the module's man page). All of these benchmarks show pretty much the same timing difference between the module and the system "find" utility.

#!/usr/bin/perl

use strict;
use Benchmark;
use File::Find::Rule;

( @ARGV == 1 and -d $ARGV[0] )
    or die "Usage: $0 some/path\n";

print "started at ", scalar localtime, $/;
timethese( 10, {
        'Shell-find pipe' => \&try_pipe,
        'file::Find::Rule' => \&try_ffr,
           });

sub try_ffr {
    my @f = File::Find::Rule->file()->name( '*.pm' )->in( $ARGV[0] );
    print scalar @f, " .pm files found under $ARGV[0] at ", scalar loc
+altime, $/;
}

sub try_pipe {
    open( FIND, "find $ARGV[0] -name '*.pm' |" );
    my @f = <FIND>;
    print scalar @f, " .pm files found under $ARGV[0] at ", scalar loc
+altime, $/;
}

__END__

# sample run:

$ ffr-bm.pl /usr      
started at Sun Jul 16 23:48:33 2006
Benchmark: timing 10 iterations of Shell-find pipe, file::Find::Rule..
+.
481 .pm files found under /usr at Sun Jul 16 23:48:41 2006
481 .pm files found under /usr at Sun Jul 16 23:48:44 2006
481 .pm files found under /usr at Sun Jul 16 23:48:46 2006
481 .pm files found under /usr at Sun Jul 16 23:48:48 2006
481 .pm files found under /usr at Sun Jul 16 23:48:50 2006
481 .pm files found under /usr at Sun Jul 16 23:48:52 2006
481 .pm files found under /usr at Sun Jul 16 23:48:53 2006
481 .pm files found under /usr at Sun Jul 16 23:48:55 2006
481 .pm files found under /usr at Sun Jul 16 23:48:57 2006
481 .pm files found under /usr at Sun Jul 16 23:48:59 2006
Shell-find pipe: 26 wallclock secs ( 0.03 usr  0.04 sys +  8.73 cusr  
+5.79 csys = 14.59 CPU) @ 142.86/s (n=10)
481 .pm files found under /usr at Sun Jul 16 23:49:19 2006
481 .pm files found under /usr at Sun Jul 16 23:49:41 2006
481 .pm files found under /usr at Sun Jul 16 23:49:59 2006
481 .pm files found under /usr at Sun Jul 16 23:50:14 2006
481 .pm files found under /usr at Sun Jul 16 23:50:29 2006
481 .pm files found under /usr at Sun Jul 16 23:50:44 2006
481 .pm files found under /usr at Sun Jul 16 23:51:02 2006
481 .pm files found under /usr at Sun Jul 16 23:51:24 2006
481 .pm files found under /usr at Sun Jul 16 23:51:42 2006
481 .pm files found under /usr at Sun Jul 16 23:51:57 2006
file::Find::Rule: 178 wallclock secs (33.39 usr + 53.05 sys = 86.44 CP
+U) @  0.12/s (n=10)
[download]

The output shows that the OS's own caching behavior gives an "unfair advantage" to F::F::R -- the "shell-find pipe" approach took 7 sec on its first iteration, and less than 3 sec on each of the remaining nine iterations. But even with the OS caching already done, F::F::R still takes between 15 and 22 sec per iteration, and puts a much heavier load on the cpu. (This is with perl, v5.8.6 built for darwin-thread-multi-2level on macosx 10.4.7; I've seen similar results on freebsd and linux.)

If you aren't doing any really big directory trees, and/or you don't care how long it takes, using some version of File::Find is "good enough", but for serious work on a really large directory tree, it's worthwhile to take advantage of the perl's value as a "glue" language (to make efficient use of existing system resources), rather than taking advantage of these particular modules.

In reply to Re: What makes File::Find's interface so commonly hated by graff
in thread What makes File::Find's interface so commonly hated by demerphq

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Perl Monk, Perl Meditation
	PerlMonks