comment on

By definition, one cannot process a large directory efficiently :)

Find a way to key the file names so that you can move to a multi-level directory structure. This might be the first two characters or last two characters of the filename:

filename	key	result

dabhds.txt	d	d/dabhds.txt
xyzzy.txt	x	x/xyzzy.txt

43816.txt	16	16/43816.txt
73813.txt	13	13/73813.txt

The main point to remember is that given the filename, you can derive the directory it should be in. And if it gets moved to the wrong directory, you can check for it programmatically.

I worked on a web site like yours once. When I was called in, there were more than 600 000 files sitting in one directory. This was on a Linux 2.0 kernel on an ext2 filesystem. The directory entry itself was over 3 megabytes. Needless to say performance suffered...

We managed to get it into a three-level directory structure (7/78/78123.txt) but it tooks hours of time at the kernel level because the directory traversal was so slow.

Bite the bullet and reorganise your files, before it's too late!

Note that the filenames are numeric (1232.txt etc. etc.) then you want to key on the last digits, not the first digits, otherwise you'll skew the number of files per directory to the low-numbered ones because of the Grubby Pages Effect (more links here),

In reply to Re: Efficient processing of large directory (n-tier directories and the Grubby Pages Effect) by grinder
in thread Efficient processing of large directory by Elliott

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


good chemistry is complicated, and a little bit messy -LW
	PerlMonks