Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Fastest way to recurse through VERY LARGE directory tree

by graff (Chancellor)
on Jan 22, 2011 at 18:25 UTC ( [id://883696]=note: print w/replies, xml ) Need Help??


in reply to Fastest way to recurse through VERY LARGE directory tree

I've been a devotee of using the compiled unix/linux "find" utility in preference to File::Find or any of its variants/derivatives, because I found that for any directory tree of significant size, something like this:
open( FIND, '-|', 'find', $path, @args ); while (<FIND>) { .... }
was significantly faster. In fact, just for grins, I tried an old benchmark that I posted here several years ago, to see if the results were still true with reasonably modern versions (perl 5.10.0 on macosx, File::Find 1.12), and I found an order of magnitude difference on a reasonably large tree (~30K files, 90 sec using File::Find, 9 sec using "find".)

But then I ran into a case where someone had created a really obscene quantity of files in a single directory on a freebsd file server, and freebsd's "find" utility choked. (Apparently, that version of "find" was building some sort of in-memory storage for each directory, and it hit a massive number of page faults on the path in question.)

I reverted to a recursive opendir/readdir approach for that case, and it succeeded reasonably well. Under "normal" conditions, compiled "find" seems to run about 10% faster than using recursive opendir/readdir, but in that particular case of an "abnormal" directory, freebsd "find" became effectively unusable, while opendir/readdir performance was consistent with normal conditions.

I just posted a utility I wrote for scanning directories, which uses recursive opendir/readdir: Get useful info about a directory tree -- I'm sure it includes a lot of baggage that you don't need, but perhaps it won't be too hard to pick out the useful bits...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://883696]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-25 17:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found