in reply to Hardware scalable web architecture

"truly scalable" basically means paralellisable - that is each machine can operate independently of each other machine. There aren't so many interesting web apps where this is true. For example, anything where users can create content that needs to be visible by all other users will require a database that is read by all machines and this will be your bottleneck to scalability.

An example of something that is truly scalable is a news site (excluding comments). You simply push the new content out to each machine and let them serve. If you need more capacity, you add more machines. They can even be in different parts of the world.

You can mix the two situations to help with scaling. So for example you have 100 main web servers which weave together static and dynamic content. They don't talk to the database, that's left to the 20 dynamic content servers which generate HTML and pass it to the main web servers. They might also do caching etc. This way you limit the number of machines holding open DB connections. It doesn't matter so much how your static content grows, you could start serving videos instead of text, because that's handled by the main web servers and you can add lots more of those. You're still limited in how much dynamic content you can add because every piece of dynamic content comes from the DB.

The "big boys" are lucky. Plain old web search requires no user data, all you need is a cluster of machines with the data. You can scale with user growth by simply throwing another cluster at it. Scaling with data growth is going to be the hard part.

Replies are listed 'Best First'.
Re^2: Hardware scalable web architecture
by tirwhan (Abbot) on Apr 10, 2006 at 10:21 UTC

    One way to better scale database interaction is to set up slave DB servers which are used for querying the data. The master DB server is only used for inserting new data and replicating it to the slaves. Architecturally this is a lot easier than a truly replicated database setup (with multiple masters that can be used for both reading and writing), and several such solutions exist (see for example Slony for postgresql). fergal's point still applies though, not all applications can benefit from this kind of setup.

    So if you want to leave this option open for the future while building your webapp you should use different handles for reading and writing to the DB. Those can point to the same server during first deployment and later be changed to access the master and slave(s) when you need to scale.

    All dogma is stupid.