Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

By definition, one cannot process a large directory efficiently :)

Find a way to key the file names so that you can move to a multi-level directory structure. This might be the first two characters or last two characters of the filename:

filename key result

dabhds.txt d d/dabhds.txt
xyzzy.txt x x/xyzzy.txt

43816.txt 16 16/43816.txt
73813.txt 13 13/73813.txt

The main point to remember is that given the filename, you can derive the directory it should be in. And if it gets moved to the wrong directory, you can check for it programmatically.

I worked on a web site like yours once. When I was called in, there were more than 600 000 files sitting in one directory. This was on a Linux 2.0 kernel on an ext2 filesystem. The directory entry itself was over 3 megabytes. Needless to say performance suffered...

We managed to get it into a three-level directory structure (7/78/78123.txt) but it tooks hours of time at the kernel level because the directory traversal was so slow.

Bite the bullet and reorganise your files, before it's too late!

Note that the filenames are numeric (1232.txt etc. etc.) then you want to key on the last digits, not the first digits, otherwise you'll skew the number of files per directory to the low-numbered ones because of the Grubby Pages Effect (more links here),


In reply to Re: Efficient processing of large directory (n-tier directories and the Grubby Pages Effect) by grinder
in thread Efficient processing of large directory by Elliott

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others contemplating the Monastery: (7)
    As of 2014-09-21 07:46 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      How do you remember the number of days in each month?











      Results (167 votes), past polls