|Perl: the Markov chain saw|
Re: Efficient processing of large directory (n-tier directories and the Grubby Pages Effect)by grinder (Bishop)
|on Oct 02, 2003 at 17:16 UTC||Need Help??|
By definition, one cannot process a large directory efficiently :)
Find a way to key the file names so that you can move to a multi-level directory structure. This might be the first two characters or last two characters of the filename:
The main point to remember is that given the filename, you can derive the directory it should be in. And if it gets moved to the wrong directory, you can check for it programmatically.
I worked on a web site like yours once. When I was called in, there were more than 600 000 files sitting in one directory. This was on a Linux 2.0 kernel on an ext2 filesystem. The directory entry itself was over 3 megabytes. Needless to say performance suffered...
We managed to get it into a three-level directory structure (7/78/78123.txt) but it tooks hours of time at the kernel level because the directory traversal was so slow.
Bite the bullet and reorganise your files, before it's too late!
Note that the filenames are numeric (1232.txt etc. etc.) then you want to key on the last digits, not the first digits, otherwise you'll skew the number of files per directory to the low-numbered ones because of the Grubby Pages Effect (more links here),