Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re^3: Design flat files database

by jpl (Monk)
on Jul 15, 2011 at 12:10 UTC ( #914583=note: print w/replies, xml ) Need Help??

in reply to Re^2: Design flat files database
in thread Design flat files database

If your system is busy, the top level directory is likely to end up in buffer cache, no matter where it is rooted. (If it's not busy, a few extra milliseconds won't matter). Take a look at directory sizes for 1-digit/2-digit/3-digit prefixes. A "sweet spot" would be directories that (just) fit into whatever block size your filesystem uses, often 4KB. I'm guessing that will be 2- or 3-digit prefixes, depending on how dense the prefixes are and how the file system structures directories. You might let the top level directory get a bit larger, on the grounds that frequent access will keep it pinned in buffer cache.

Don't go nuts with premature optimization. Make it easy to alter the structure of the directories, like having a routine to return an array of components corresponding to the directory entries. Then measure performance with a few alternatives.

Replies are listed 'Best First'.
Re^4: Design flat files database
by AlfaProject (Beadle) on Jul 15, 2011 at 12:45 UTC
    nice! Thanks ! What about opening a file ?
    It's really matters if I store replies for posts in 1 file or file for each reply ?
    In most cases they will be grabbed together from the database, but sometimes will be edited or removed by users
    I mean if the open file action is slow like directory access or it's fast ?
    And final question :) If it's really better to make user folders with id's and the username to store inside his data file
    just to use username as a directory name ?
    Thanks a lot! This forum rocks :)
      Thanks a lot! This forum rocks :)
      Well, you're pretty much guaranteed to get your money's worth :-)

      I like flat files because I can use all my favorite command line tools to manipulate them, without interacting with a database. And I have a lot of insight and control over what is involved in accessing them. As soon as you start aggregating the files because they may be "grabbed together", then some of the advantages of flat files start getting outweighed by the additional structure needed to aggregate them. Don't rule a real database out, they are good at such things.

      Opening a file is not much different from searching a directory... you have to get the contents, and for flat files, that means going out to disk. If you keep related messages in the same file, you may be able to save disk accesses by getting more than one message with one read. But now you've made maintaining the collection of messages more difficult, and searching more complex, because you have different messages in the same file. A database may well be able to do this better.

      As with directory structure, try to hide the implementation details. Start with single files, aggregating common files under a single directory. If the performance is good enough, this is likely to be a lot simpler than storing multiple messages in a single file. Above all, keep the user interface simple. They don't want to know about implementation details. And don't dismiss a database out of hand. It may make your life a lot easier, and perform at least as well as a cleverly crafted flat-file implementation

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://914583]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2017-08-19 14:08 GMT
Find Nodes?
    Voting Booth?
    Who is your favorite scientist and why?

    Results (311 votes). Check out past polls.