Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: Fastest way to recurse through VERY LARGE directory tree

by graff (Chancellor)
on Jan 22, 2011 at 18:25 UTC ( #883696=note: print w/replies, xml ) Need Help??

in reply to Fastest way to recurse through VERY LARGE directory tree

I've been a devotee of using the compiled unix/linux "find" utility in preference to File::Find or any of its variants/derivatives, because I found that for any directory tree of significant size, something like this:
open( FIND, '-|', 'find', $path, @args ); while (<FIND>) { .... }
was significantly faster. In fact, just for grins, I tried an old benchmark that I posted here several years ago, to see if the results were still true with reasonably modern versions (perl 5.10.0 on macosx, File::Find 1.12), and I found an order of magnitude difference on a reasonably large tree (~30K files, 90 sec using File::Find, 9 sec using "find".)

But then I ran into a case where someone had created a really obscene quantity of files in a single directory on a freebsd file server, and freebsd's "find" utility choked. (Apparently, that version of "find" was building some sort of in-memory storage for each directory, and it hit a massive number of page faults on the path in question.)

I reverted to a recursive opendir/readdir approach for that case, and it succeeded reasonably well. Under "normal" conditions, compiled "find" seems to run about 10% faster than using recursive opendir/readdir, but in that particular case of an "abnormal" directory, freebsd "find" became effectively unusable, while opendir/readdir performance was consistent with normal conditions.

I just posted a utility I wrote for scanning directories, which uses recursive opendir/readdir: Get useful info about a directory tree -- I'm sure it includes a lot of baggage that you don't need, but perhaps it won't be too hard to pick out the useful bits...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://883696]
and the rats come out to play...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2018-02-24 22:37 GMT
Find Nodes?
    Voting Booth?
    When it is dark outside I am happiest to see ...

    Results (311 votes). Check out past polls.