Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

By definition, one cannot process a large directory efficiently :)

Find a way to key the file names so that you can move to a multi-level directory structure. This might be the first two characters or last two characters of the filename:

filename key result

dabhds.txt d d/dabhds.txt
xyzzy.txt x x/xyzzy.txt

43816.txt 16 16/43816.txt
73813.txt 13 13/73813.txt

The main point to remember is that given the filename, you can derive the directory it should be in. And if it gets moved to the wrong directory, you can check for it programmatically.

I worked on a web site like yours once. When I was called in, there were more than 600 000 files sitting in one directory. This was on a Linux 2.0 kernel on an ext2 filesystem. The directory entry itself was over 3 megabytes. Needless to say performance suffered...

We managed to get it into a three-level directory structure (7/78/78123.txt) but it tooks hours of time at the kernel level because the directory traversal was so slow.

Bite the bullet and reorganise your files, before it's too late!

Note that the filenames are numeric (1232.txt etc. etc.) then you want to key on the last digits, not the first digits, otherwise you'll skew the number of files per directory to the low-numbered ones because of the Grubby Pages Effect (more links here),

In reply to Re: Efficient processing of large directory (n-tier directories and the Grubby Pages Effect) by grinder
in thread Efficient processing of large directory by Elliott

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others cooling their heels in the Monastery: (5)
    As of 2018-05-20 20:25 GMT
    Find Nodes?
      Voting Booth?