Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
I agree with the comments in the previous replies, and would add that there is also File::Finder, which provides something more like the command-line interface of the common unix "find" utility; like File::Find::Rule, this "amendment" to File::Find makes it a lot easier to come up with working code.

But both the ::Finder and ::Rule extensions are just wrappers around the core File::Find module, and all three end up suffering from the same problem relative to using the basic "find" utility -- they are much slower, and this is the main reason why I hate File::Find and anything based on it.

I'd much rather open a pipeline file handle running the "find" command: this utility is either native or freely available for all common OS's, it's pretty easy to use in a perl script via the file handle idiom, and it runs a lot faster -- typically a by factor of six in wallclock time.

I posted a benchmark on File::Find four years ago, and another on File::Finder two years ago, so here's a new one for File::Find::Rule (using an example from the module's man page). All of these benchmarks show pretty much the same timing difference between the module and the system "find" utility.

#!/usr/bin/perl use strict; use Benchmark; use File::Find::Rule; ( @ARGV == 1 and -d $ARGV[0] ) or die "Usage: $0 some/path\n"; print "started at ", scalar localtime, $/; timethese( 10, { 'Shell-find pipe' => \&try_pipe, 'file::Find::Rule' => \&try_ffr, }); sub try_ffr { my @f = File::Find::Rule->file()->name( '*.pm' )->in( $ARGV[0] ); print scalar @f, " .pm files found under $ARGV[0] at ", scalar loc +altime, $/; } sub try_pipe { open( FIND, "find $ARGV[0] -name '*.pm' |" ); my @f = <FIND>; print scalar @f, " .pm files found under $ARGV[0] at ", scalar loc +altime, $/; } __END__ # sample run: $ ffr-bm.pl /usr started at Sun Jul 16 23:48:33 2006 Benchmark: timing 10 iterations of Shell-find pipe, file::Find::Rule.. +. 481 .pm files found under /usr at Sun Jul 16 23:48:41 2006 481 .pm files found under /usr at Sun Jul 16 23:48:44 2006 481 .pm files found under /usr at Sun Jul 16 23:48:46 2006 481 .pm files found under /usr at Sun Jul 16 23:48:48 2006 481 .pm files found under /usr at Sun Jul 16 23:48:50 2006 481 .pm files found under /usr at Sun Jul 16 23:48:52 2006 481 .pm files found under /usr at Sun Jul 16 23:48:53 2006 481 .pm files found under /usr at Sun Jul 16 23:48:55 2006 481 .pm files found under /usr at Sun Jul 16 23:48:57 2006 481 .pm files found under /usr at Sun Jul 16 23:48:59 2006 Shell-find pipe: 26 wallclock secs ( 0.03 usr 0.04 sys + 8.73 cusr +5.79 csys = 14.59 CPU) @ 142.86/s (n=10) 481 .pm files found under /usr at Sun Jul 16 23:49:19 2006 481 .pm files found under /usr at Sun Jul 16 23:49:41 2006 481 .pm files found under /usr at Sun Jul 16 23:49:59 2006 481 .pm files found under /usr at Sun Jul 16 23:50:14 2006 481 .pm files found under /usr at Sun Jul 16 23:50:29 2006 481 .pm files found under /usr at Sun Jul 16 23:50:44 2006 481 .pm files found under /usr at Sun Jul 16 23:51:02 2006 481 .pm files found under /usr at Sun Jul 16 23:51:24 2006 481 .pm files found under /usr at Sun Jul 16 23:51:42 2006 481 .pm files found under /usr at Sun Jul 16 23:51:57 2006 file::Find::Rule: 178 wallclock secs (33.39 usr + 53.05 sys = 86.44 CP +U) @ 0.12/s (n=10)

The output shows that the OS's own caching behavior gives an "unfair advantage" to F::F::R -- the "shell-find pipe" approach took 7 sec on its first iteration, and less than 3 sec on each of the remaining nine iterations. But even with the OS caching already done, F::F::R still takes between 15 and 22 sec per iteration, and puts a much heavier load on the cpu. (This is with perl, v5.8.6 built for darwin-thread-multi-2level on macosx 10.4.7; I've seen similar results on freebsd and linux.)

If you aren't doing any really big directory trees, and/or you don't care how long it takes, using some version of File::Find is "good enough", but for serious work on a really large directory tree, it's worthwhile to take advantage of the perl's value as a "glue" language (to make efficient use of existing system resources), rather than taking advantage of these particular modules.


In reply to Re: What makes File::Find's interface so commonly hated by graff
in thread What makes File::Find's interface so commonly hated by demerphq

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (7)
As of 2024-04-18 15:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found