Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Large chunks of text: in-database or in-filesystem?

by Tanktalus (Canon)
on Feb 24, 2005 at 00:42 UTC ( #433918=note: print w/replies, xml ) Need Help??


in reply to Large chunks of text: in-database or in-filesystem?

Once I went part-way to a database, I would go all the way. There is really no point in having the overhead of multiple data interfaces.

Yes, a database is usually slower than the filesystem. However, the difference is that a database is scalable. It handles concurrency and scalability for you automatically. (Well, I'm used to using commercial db's .. I'm not sure where mySQL or PostgreSQL stack up here - not a flame, I honestly don't know.)

Let's say your website got really, really popular. You want to handle the load better. Upgrading hardware is one - it means you need to backup and restore to the new machine - but you also have to make sure you get all your extra files across, too. Instead, you may just want to add a second machine to the fray, and use IP round-robin to spread the load. (Or any other method of spreading the load.) I'm also presuming you spring for gigabit ethernet to connect the boxes to each other on a private network - your regular internet connection should not see any traffic between your machines.

DB & filesystem

You're looking at a number of options. I'm going to deal with the filesystem first, because the DB will be dealt in the DB-only section.

  1. Replication of files from one node to another. This may be done via rsync, but it means some files won't be visible to one node until the next rsync. Risky.
  2. Share via NFS. NFS isn't exactly the most reliable software out there, but then again, this is HTTP we're serving over anyway. However, the NFS server is going to get hit hard. Every read, every write, goes over NFS to the server. The speed is likely to be comparable to the database server now.
  3. Share via NAS or SAN. Both machines access the files directly. I'm not entirely sure how locking works on these ... presumably it works the same as if it were local. At this point, you'd put your database on the NAS or SAN, too. Expensive, though.

Upgrading again, you may make one machine both an NFS server and a DB server, and two machines are acting as web/cgi servers. Which may mean more moving around. You're running two servers on this machine (NFS,DB). It's not really sounding compelling to me.

DB-only

Put everything on the db. One machine acts as DB server, the other as a client, both as web servers. You can scale this as much as you want - create a cluster of DB servers that act as a single server, and a cluster of web servers that talk to the DB server cluster. Nearly unlimited scalability here. Put your DB on big iron if you want/need. Secure the whole thing by closing down unneeded ports, including NFS.

You can control the web servers as completely independant servers from each other, and control the DB server(s) as completely independant from the web server. Even if they're on the same machine.

To me, the scalability of the database is the clear-out winner. It's not even a contest on short-term efficiency.

Disclaimer: I don't do this for a living :-)

  • Comment on Re: Large chunks of text: in-database or in-filesystem?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://433918]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (2)
As of 2022-05-28 04:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (98 votes). Check out past polls.

    Notices?