Once I went part-way to a database, I would go all the way. There is really no point in having the overhead of multiple data interfaces.
Yes, a database is usually slower than the filesystem. However, the difference is that a database is scalable. It handles concurrency and scalability for you automatically. (Well, I'm used to using commercial db's .. I'm not sure where mySQL or PostgreSQL stack up here - not a flame, I honestly don't know.)
Let's say your website got really, really popular. You want to handle the load better. Upgrading hardware is one - it means you need to backup and restore to the new machine - but you also have to make sure you get all your extra files across, too. Instead, you may just want to add a second machine to the fray, and use IP round-robin to spread the load. (Or any other method of spreading the load.) I'm also presuming you spring for gigabit ethernet to connect the boxes to each other on a private network - your regular internet connection should not see any traffic between your machines.
DB & filesystem
You're looking at a number of options. I'm going to deal with the filesystem first, because the DB will be dealt in the DB-only section.
- Replication of files from one node to another. This may be done via rsync, but it means some files won't be visible to one node until the next rsync. Risky.
- Share via NFS. NFS isn't exactly the most reliable software out there, but then again, this is HTTP we're serving over anyway. However, the NFS server is going to get hit hard. Every read, every write, goes over NFS to the server. The speed is likely to be comparable to the database server now.
- Share via NAS or SAN. Both machines access the files directly. I'm not entirely sure how locking works on these ... presumably it works the same as if it were local. At this point, you'd put your database on the NAS or SAN, too. Expensive, though.
Upgrading again, you may make one machine both an NFS server and a DB server, and two machines are acting as web/cgi servers. Which may mean more moving around. You're running two servers on this machine (NFS,DB). It's not really sounding compelling to me.
DB-only
Put everything on the db. One machine acts as DB server, the other as a client, both as web servers. You can scale this as much as you want - create a cluster of DB servers that act as a single server, and a cluster of web servers that talk to the DB server cluster. Nearly unlimited scalability here. Put your DB on big iron if you want/need. Secure the whole thing by closing down unneeded ports, including NFS.
You can control the web servers as completely independant servers from each other, and control the DB server(s) as completely independant from the web server. Even if they're on the same machine.
To me, the scalability of the database is the clear-out winner. It's not even a contest on short-term efficiency.
Disclaimer: I don't do this for a living :-)
| [reply] |
In terms of cons, (off the top of my inexperienced head)
If you use files, you have twice as many things to manage. Where you had database-handling code, you need to add file-handling code. If you delete a records, you also have to delete a file. You have to backup both the database and the files. If you had load sharing, you have to share the database and the files. In other words, twice as many things can go wrong. Also, a database adds (limited) searchability.
The cons for using the database might include increased database workload, and data size limitations.
| [reply] |
What are you going to need?
On one hand, keeping files in the filesystem will always be faster. You don't have the overhead of the database layer.
On the other hand, keeping the file in the db buys you the ability to search quickly. That's pretty dang useful.
As far as accessing the text data, it doesn't sound like either method is a dealbreaker. 65K isn't /that/ much.
--
jpg
| [reply] |