Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re^2: Optimizing performance for script to traverse on filesystem

by Marshall (Monsignor)
on Feb 02, 2012 at 07:46 UTC ( #951372=note: print w/replies, xml ) Need Help??

in reply to Re: Optimizing performance for script to traverse on filesystem
in thread Optimizing performance for script to traverse on filesystem

I guess that I'm the "devil's advocate" to the "devil's advocate"?

re: File::Find - I think that we could cooperate and possibly increase internal performance (I'm game for that), but the interface is "spot on" - it works!.

My suggested modifications to the OP's code represents a massive simplification of program logic.

There is only one file system operation that happens per $File:Find::name. Maybe File::Find does some more "under the covers"? I'm not sure what you are proposing... But basically, I see no problem with code that makes a single decision based upon a single input.

I'm game to increase the performance of File::Find - are you willing to help me do it?
I think that will be be a pretty hard undertaking.
I'm not sure that it is even possible.
But if it is, let's go for it!

  • Comment on Re^2: Optimizing performance for script to traverse on filesystem

Replies are listed 'Best First'.
Re^3: Optimizing performance for script to traverse on filesystem
by graff (Chancellor) on Feb 03, 2012 at 04:34 UTC
    Thank you for the invitation. Actually, it might be a worthwhile first step just to make sure my assertion isn't based on faulty evidence. If you get a chance to check out the benchmark in the thread I cited above (specifically at this node: Re^2: Get useful info about a directory tree), it's entirely possible that the timing results there are reflecting something other than a difference between File::Find and straight recursion with opendir/readdir.

    (I've seen enough benchmark discussions here at the monastery to know that a proper benchmark can be an elusive creature.)

    If that benchmark happens to be a valid comparison of the two approaches, it would also be a good exercise for a debugger or profiler session, to see what's causing the difference.

    In any case, I definitely don't want to dissuade people from using File::Find or its various derivatives and convenience wrappers -- they do make for much easier solutions to the basic problem, and in the vast majority of cases, a little extra run time is a complete non-issue. (It's just that I've had to face a few edge cases where improving run time when traversing insanely large directories made a big difference.)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://951372]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2017-01-21 09:11 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (183 votes). Check out past polls.