Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Fastest way to recurse through VERY LARGE directory tree

by graff (Chancellor)
on Jan 22, 2011 at 18:25 UTC ( #883696=note: print w/ replies, xml ) Need Help??


in reply to Fastest way to recurse through VERY LARGE directory tree

I've been a devotee of using the compiled unix/linux "find" utility in preference to File::Find or any of its variants/derivatives, because I found that for any directory tree of significant size, something like this:

open( FIND, '-|', 'find', $path, @args ); while (<FIND>) { .... }
was significantly faster. In fact, just for grins, I tried an old benchmark that I posted here several years ago, to see if the results were still true with reasonably modern versions (perl 5.10.0 on macosx, File::Find 1.12), and I found an order of magnitude difference on a reasonably large tree (~30K files, 90 sec using File::Find, 9 sec using "find".)

But then I ran into a case where someone had created a really obscene quantity of files in a single directory on a freebsd file server, and freebsd's "find" utility choked. (Apparently, that version of "find" was building some sort of in-memory storage for each directory, and it hit a massive number of page faults on the path in question.)

I reverted to a recursive opendir/readdir approach for that case, and it succeeded reasonably well. Under "normal" conditions, compiled "find" seems to run about 10% faster than using recursive opendir/readdir, but in that particular case of an "abnormal" directory, freebsd "find" became effectively unusable, while opendir/readdir performance was consistent with normal conditions.

I just posted a utility I wrote for scanning directories, which uses recursive opendir/readdir: Get useful info about a directory tree -- I'm sure it includes a lot of baggage that you don't need, but perhaps it won't be too hard to pick out the useful bits...


Comment on Re: Fastest way to recurse through VERY LARGE directory tree
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://883696]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (13)
As of 2014-10-01 14:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What is your favourite meta-syntactic variable name?














    Results (26 votes), past polls