Re^2: Design flat files database
by BrowserUk (Patriarch) on Jul 15, 2011 at 15:47 UTC
|
You could search a few thousand bytes of directory data, even using linear search, in far less time than it would take you to access that data.
Sorry, whilst I'm no expert on *nix filesystems, I think you are wrong. At least as far as ext2 goes; and ext3 used in its default manner.
The problem is that in the former, and by default in the latter, the filename to inode mappings stored in the directory files are a single linked list that must be searched from the top each time.
Directories
Each directory is a list of directory entries. Each directory entry associates one file name with one inode number, and consists of the inode number, the length of the file name, and the actual text of the file name. To find a file, the directory is searched front-to-back for the associated filename. For reasonable directory sizes, this is fine. But for huge large directories this is inefficient, and ext3 offers a second way of storing directories that is more efficient than just a list of filenames.
So, not only does that mean that in a 1 million file directory, that you need to inspect 500,000 files on average each time, it also means that the VFS is unlikely to be able to retain the entire directory file in cache, which means frequent re-reads.
Ext3 has a mechanism (Htree) for improving this:Creating 100,000 files in a single directory took 38 minutes without directory indexing... and 11 seconds with the directory indexing turned on.. The trouble is, very few people use it.
A few years ago, (from memory, on BSD), moving ~100 million files from a single directory to a three level hierarchy improved the time taken to locate and read small (a few kb) files from whole seconds to 10 of milliseconds. Try it yourself to see the difference it makes.
By reducing the size of the directories to 100 or 256, the entire directory at any level can fit into a single block. The root level effectively gets locked in cache making the first reduction happen in microseconds. And for an application like the OPs where most accesses will be for the latest message IDs, the one or two second level directories that will be most accessed will also tend to remain cache resident. So in use, most accesses will not need to hit the disk at all until it comes to reading or writing the top level file.
The benefits are far less when there is no locality of reference -- ie. the files are accessed randomly -- but for the OPs application, they should be tangilble and very worthwhile.
I also agree that going too deep negates the benefits.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [Watch: Dir/Any] |
|
| [reply] [Watch: Dir/Any] |
|
I have recently been trying to nudge the OP in the direction of databases, and that's a nudge I see reflected in many of the responses.
Indeed. I asked a similar question.
Why are you settled upon a "flat file database" rather than one of the other options? (RDBMS, HADOOP, NoSQL etc.)
That said, RDBMSs are pretty shite at handling hierarchal datasets, whereas file-systems are explicitly designed and tuned for exactly that. It would be an interesting exercise to compare the response times for the two using identical, threaded datasets. But then again, neither scale well.
Facebook apparently use hundreds of sharded MySQL instances ensconced behind 1000s of memcache instances with more (PHP!?!) caching in front of that. They seem to make it work, but it sounds like a disaster waiting to happen to me. But we can probably assume that the OP isn't likely to be requiring that scale of things anytime soon.
One nice thing about using the file-system is that it is relatively easy to scale it out across multiple boxes, by partitioning the ID space to pretty much whatever level is required. Raided disks in each box take care of your hardware redundancy and each box trickles off updates in the background to remote off-line storage. Far easier to partition and manage than distributed RDBMSs and no coherency problems.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [Watch: Dir/Any] |
Re^2: Design flat files database
by AlfaProject (Beadle) on Jul 15, 2011 at 11:22 UTC
|
that's the point , I am searching for sweet spot for how much i need go deep?
also another thing i was thinking on ..each time user folder accessed , filesystem need to go like :
/var/www/cgi-bin/PROJECT/DB/1/2/3/123
what if i will put a link to DB in the root directory for faster access ? What do you think about that?
/user_db/1/2/3/123
Thanks | [reply] [Watch: Dir/Any] [d/l] [select] |
|
If your system is busy, the top level directory is likely to end up in buffer cache, no matter where it is rooted. (If it's not busy, a few extra milliseconds won't matter). Take a look at directory sizes for 1-digit/2-digit/3-digit prefixes. A "sweet spot" would be directories that (just) fit into whatever block size your filesystem uses, often 4KB. I'm guessing that will be 2- or 3-digit prefixes, depending on how dense the prefixes are and how the file system structures directories. You might let the top level directory get a bit larger, on the grounds that frequent access will keep it pinned in buffer cache.
Don't go nuts with premature optimization. Make it easy to alter the structure of the directories, like having a routine to return an array of components corresponding to the directory entries. Then measure performance with a few alternatives.
| [reply] [Watch: Dir/Any] |
|
nice! Thanks !
What about opening a file ?
It's really matters if I store replies for posts in 1 file or file for each reply ?
In most cases they will be grabbed together from the database, but sometimes will be edited or removed by users
I mean if the open file action is slow like directory access or it's fast ?
And final question :) If it's really better to make user folders with id's and the username to store inside his data file or
just to use username as a directory name ?
Thanks a lot! This forum rocks :)
| [reply] [Watch: Dir/Any] |
|