Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
I'm not sure why you think this is better than File::Find.
Let's just say that I've seen (and posted) evidence that running the "find" utility in a sub-shell was faster than using File::Find on a common task, other things being equal. And I've seen lots of SoPW posts where people have run into a wide variety of problems because they didn't quite figure out the right way to use it -- seems like folks are able to get into all kinds of deep trouble with this module (in fact, this snippet was originally part of a reply to one such SoPW node). In contrast, working through a flat list of directories, and operating on data files within each one, is something that most folks can get their heads around.
You've fulfilled none of your objectives, and only made it more dependant on the outside environment, and slower, and take more net memory.
On the contrary, my goal was to avoid complicated recursion and excess memory consumption within a perl script, and this proposal meets both goals. The C-compiled "find" utility runs with a constant memory footprint, regardless of the size of the directory tree being scanned, and that footprint is very small (less than one meg on both solaris and linux). I'll confess that I haven't looked at how much memory is added to a perl script by using File::Find, so I don't know how that compares; I also haven't checked the memory footprint for "find" in other OS environments.

Compiled "find" handles the recursive part of traversal easily, and allows the perl script to focus on the non-recursive part of the problem. And "find" is faster than File::Find (I wonder whether you have seen any evidence that would contradict this). Dependency on the "outside environment" is certainly not an evil in itself, especially when it saves time during both coding and execution -- it's a good feature of perl that this sort of dependency is easy to exploit (as in "not reinventing the wheel").

Update: (I think this may be the first time I ever downvoted one of your nodes, merlyn.) I installed File::Finder (along with the "Text::Glob" module that it depends on) just to try it out. I'm sure the OO-style approach is appealing, but I wonder whether you would recommend a different way to benchmark it... The timings shown below are on a linux box, using a target directory that contains nearly 2000 files, 17 of which are sub-directories, going down as far as four levels:

#!/usr/bin/perl use strict; use Benchmark; use File::Finder; use File::Find; my $Usage = "$0 some/path\n"; die $Usage unless @ARGV and -d $ARGV[0]; #chdir $ARGV[0] or die "can't chdir to $ARGV[0]"; # (no, don't chdir; just pass the target path to [Ff]ind... timethese( 50, { 'File::Finder module' => \&try_Finder, 'shell-find pipeline' => \&try_pipe, }); sub try_Finder { my $files = File::Finder->type('f'); find( $files->print, $ARGV[0] ); } sub try_pipe { open( FIND, "find $ARGV[0] -type f |" ); print while (<FIND>); close FIND; } __END__ # Output: Benchmark: timing 50 iterations of File::Finder module, shell-find pip +eline... File::Finder module: 9 wallclock secs ( 8.44 usr + 0.75 sys = 9.19 +CPU) @ 5.44/s (n=50) shell-find pipeline: 2 wallclock secs ( 0.47 usr 0.06 sys + 0.38 cu +sr 0.50 csys = 1.41 CPU) @ 94.34/s (n=50)
(another update: Just to clarify, I ran the above with a command line like this:
perl some_path | grep -v some_path
so that only the benchmark output went to the terminal, and the time to actually send 100 * 2000 file-names to the screen was not part of the comparison.)

last update: (I promise!) Just to be sure, I tried using different "names" (hash keys) for the two test functions, so that the benchmark would run the shell version first -- just in case there was a "first time through vs. cached" issue when scanning the directory -- and the results came out the same: "find" is many times faster than File::Find.

In reply to Re: &bull;Re: An alternative to File::Find by graff
in thread An alternative to File::Find by graff

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (3)
    As of 2018-08-14 06:42 GMT
    Find Nodes?
      Voting Booth?
      Asked to put a square peg in a round hole, I would:

      Results (144 votes). Check out past polls.