Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^2: Perl solution for storage of large number of small files

by 2xlp (Sexton)
on May 06, 2007 at 02:18 UTC ( #613784=note: print w/replies, xml ) Need Help??


in reply to Re: Perl solution for storage of large number of small files
in thread Perl solution for storage of large number of small files

I forgot to add...

The 'best' way I discovered to store info is to use a base32 encoded digest of the file.

Why? well, most filesystems start to have issues at some point after 1000 items per directory. using a base32 representation, you can hit the sweet spot.

with a base32 formula and 2chars per bin, you'll have 1024 bins per depth ( 32*32 ). using the standard hex based md5 representation, your options are either 256 buckets per depth ( 16*16 ) or 4096 ( 16*16*16 ).

in my personal use, i haven't seen all that much of a difference between 1024 and 4096 buckets -- though i've seen a slight difference. its not as drastic as the performance between either and 10k though.

since i'm lazy and I don't have high performace constraints, i just go with 2 levels deep of 4096 hashing. but if i had more time, I'd definitely go with 3 levels of 1024 hashing.

( the time / lazyness is a factor because every language supports md5 as base16 or base64 by default -- no one has a base32 default , which is a PITA if you're managing an architecture accessed by multiple languages at once ).

  • Comment on Re^2: Perl solution for storage of large number of small files

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://613784]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (2)
As of 2018-04-21 02:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?