Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Efficient processing of large directory (n-tier directories and the Grubby Pages Effect)

by grinder (Bishop)
on Oct 02, 2003 at 17:16 UTC ( #295976=note: print w/ replies, xml ) Need Help??


in reply to Efficient processing of large directory

By definition, one cannot process a large directory efficiently :)

Find a way to key the file names so that you can move to a multi-level directory structure. This might be the first two characters or last two characters of the filename:

filename key result

dabhds.txt d d/dabhds.txt
xyzzy.txt x x/xyzzy.txt

43816.txt 16 16/43816.txt
73813.txt 13 13/73813.txt

The main point to remember is that given the filename, you can derive the directory it should be in. And if it gets moved to the wrong directory, you can check for it programmatically.

I worked on a web site like yours once. When I was called in, there were more than 600 000 files sitting in one directory. This was on a Linux 2.0 kernel on an ext2 filesystem. The directory entry itself was over 3 megabytes. Needless to say performance suffered...

We managed to get it into a three-level directory structure (7/78/78123.txt) but it tooks hours of time at the kernel level because the directory traversal was so slow.

Bite the bullet and reorganise your files, before it's too late!

Note that the filenames are numeric (1232.txt etc. etc.) then you want to key on the last digits, not the first digits, otherwise you'll skew the number of files per directory to the low-numbered ones because of the Grubby Pages Effect (more links here),


Comment on Re: Efficient processing of large directory (n-tier directories and the Grubby Pages Effect)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://295976]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2014-07-14 03:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (254 votes), past polls