http://www.perlmonks.org?node_id=440889

TedPride has asked for the wisdom of the Perl Monks concerning the following question:

This is probably a stupid question, but what's the best way to store text from forum posts? Assume that there's a large number of posts, that text size may vary from a few lines to as much as 10K or so, that the maximum number of files allowed inside a folder is 1024, that efficient searching is a consideration. I realize that with a double-layered folder system and 16-bit hashing, I can store probably around 33 million files without seriously affecting lookup time (256 * 256 * 512 - I'm allowing a 50% margin), but backing up or transferring these files would be a major pain, and I'd have to have at least one merged file for searching purposes anyhow. But what about database searches? Seems to me that there I'd also have to have a marged copy of the text there in a processed (punctuation and extra spaces removed) format, and just searching the text as it resides in the database won't be much help. So it's basically a matter of storage efficiency. Will storing the posts in the database instead of as individual files take up a lot more space, if so how much, and which is more efficient for lookup / edit? Which would you pick and why?

I know the following node asks pretty much the same question, but there isn't enough information, specifically on the issue of storage space for database vs filesystem, for me to make a decision.
http://www.perlmonks.com/?node_id=433891

Incidently, what if I stored all the post text in a single file outside of the database, using fixed size sectors and just storing the sector start and number of sectors used for each post in the database? Given a reasonably well-designed sector size (factor or multiple of disk sector?) I could reuse sectors from "decayed" posts (new post needs 3 sectors, look for decayed post with 3 sectors) without wasting large amounts of disk space or significantly increasing lookup times. Is this worth the trouble, or does mySQL internally do the same sort of thing for large chunks of text? In other words, if the max size of the text is 64K, and the post is actually only 2K, is the extra 62K wasted? Would it be better to use, say, 8K sectors and only waste 6K?