Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^2: Optimizing performance for script to traverse on filesystem

by Marshall (Prior)
on Feb 02, 2012 at 07:46 UTC ( #951372=note: print w/ replies, xml ) Need Help??


in reply to Re: Optimizing performance for script to traverse on filesystem
in thread Optimizing performance for script to traverse on filesystem

I guess that I'm the "devil's advocate" to the "devil's advocate"?

re: File::Find - I think that we could cooperate and possibly increase internal performance (I'm game for that), but the interface is "spot on" - it works!.

My suggested modifications to the OP's code represents a massive simplification of program logic.

There is only one file system operation that happens per $File:Find::name. Maybe File::Find does some more "under the covers"? I'm not sure what you are proposing... But basically, I see no problem with code that makes a single decision based upon a single input.

I'm game to increase the performance of File::Find - are you willing to help me do it?
I think that will be be a pretty hard undertaking.
I'm not sure that it is even possible.
But if it is, let's go for it!


Comment on Re^2: Optimizing performance for script to traverse on filesystem
Replies are listed 'Best First'.
Re^3: Optimizing performance for script to traverse on filesystem
by graff (Chancellor) on Feb 03, 2012 at 04:34 UTC
    Thank you for the invitation. Actually, it might be a worthwhile first step just to make sure my assertion isn't based on faulty evidence. If you get a chance to check out the benchmark in the thread I cited above (specifically at this node: Re^2: Get useful info about a directory tree), it's entirely possible that the timing results there are reflecting something other than a difference between File::Find and straight recursion with opendir/readdir.

    (I've seen enough benchmark discussions here at the monastery to know that a proper benchmark can be an elusive creature.)

    If that benchmark happens to be a valid comparison of the two approaches, it would also be a good exercise for a debugger or profiler session, to see what's causing the difference.

    In any case, I definitely don't want to dissuade people from using File::Find or its various derivatives and convenience wrappers -- they do make for much easier solutions to the basic problem, and in the vast majority of cases, a little extra run time is a complete non-issue. (It's just that I've had to face a few edge cases where improving run time when traversing insanely large directories made a big difference.)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://951372]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2015-07-31 05:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (274 votes), past polls