http://www.perlmonks.org?node_id=440935


in reply to Large chunks of text - database or filesystem?

You are still missing too much information to make a decision

  1. What is the maximum posts per minute you expect?
  2. What is the maximum reads per minute?
  3. How many overall posts per day? (estimated growth rate)
  4. How large do you expect the messages to be?
  5. Is it a write once system, or will there be re-editing of messages?
  6. What is your hardware budget for the project, or is there fixed hardware?
  7. What is the required uptime?
  8. Are you going to have an internal search engine?
  9. If so, what sort of information are you going to search on? (metadata, or the message itself?)
  10. What are your disaster recovery requirements?
  11. Do you need to support transactional concurency?
  12. What are your time constraints?
  13. Do you already have a database to use for this purpose?
  14. Do you already have experience with databases?

Moving lots of files around is not a problem. Tar and rsync are your friends. The only problem with files comes when you're trying to work with more files at the same time than your OS supports. Databases for file storage are basically just ways of getting around those problems, and keeping extra metadata catalogs on hand to find the required information in a more efficient manner.

Depending on just what the requirements are, I might go with the file system for the message bodies, and a database for the metadata (posting time, who posted, thread tracking, etc). I might also go with a heirarchical database, rather than a relational database, if that fit well with the anticipated characteristics. I might also look at repurposing an NNTP server, rather than starting from scratch.

Personally, I wouldn't optimize for storage space, unless you're not expecting anyone to read the posts. I'd optimize for reads/writes. Depending on the nature of the forum, I might have an aging system, that moves the entries from a read or write optimized system to an alternate storage mechanism for long-term storage.