in reply to Server Load

Personally, I think a better question would be to ask:

What concepts and criteria should I put at the top of my white board before setting out to design my application, that will ensure that should I need to expand my server capacity to cope with demand, that scaling can be performed easily, seemlessly and in discrete steps as demand increases.

My answer: In a nutshell, decoupled logic.

The 'traditional' answer. A two, or preferably 3-layer model.

Ie. Presentation logic; business logic; Backend (DBM) logic. In some circumstances the latter two can be combined, but personally I wouldn't.

Architect your application from the outset as if each of these three components is going to run on a seperate machine. Even if they are actually going to run on a single machine to start with.

Note: Everything below is simply my current thinking. I'm (slowly) preparing to resume the web commerce project that brought me to these halls in the first place. So far, I have not even attempted to implement most of this and I'm posting it mostly in the hope that it will stimulate further discussion that will help me, as well as the original poster.

The layers (as I currently see them).

  1. The front door. A single, well-sorted, well-packed machine running the latest stable version of Apache and mod-perl who's sole purpose in life is to receive requests, pass them on to whatever business unit is appropriate whilst maintaining session state and then serve back a (pre-generated) web page.

    All the html served would, as far as this machine is concerned be static. When a request is received, a thread or a process would be forked and an appropriate token (product id, user id, library reference etc) would be fed (probably via a socket as this makes location transparency easy) to one or more BusinessLogicUnits. The spawned thread/process would then block until it received a return value in the form of a filename. This would be the html to be served. The thread would simply loop, feeding the tokens generated by the sessions requests to the (appropriate) BLU and blocking until the html was available. When the session ends, the state dies with it.

    If anyone has a feel, especially numbers, for how many concurrent sessions I might expect a well configured mod-perl/Apache server to be able to handle given this model, I'd love to hear them.

  2. One or more BLU's. I can see the need for at least two types.
    • Clearly defined, non-customer specific, near static, oft-called, 'product' or 'company' informational pages. These would have little or no dependancy upon the DBM, (prices, dates, phone numbers only) and could be cached, keyed by product-id or page-number. They would be generated from templates but cached for quick delivery. Any major changes to the DBM causes the cache to be wiped and the next call causes them to be individually re-generated anew.
    • Customer specific pages. shopping carts, invoices, purchase history, delivery status. These would be generated, delivered, (possibly cached until session end), and discarded. The generated html would be stored in a subdir named after the front-end session id. The discard effected by the front-end, deleting the subdir when the session ended.
  3. DBM. I'm still a bit sketchy on this one. As far as it goes, I see it like this.
    • One only DBM thread/process/server with write access. This would be served by a queing process listen on a socket, and would feed back success/failure status to the calling BLU directly.

      Many of the larger commercial DBM's would take care of this themselves, but I am as yet unsure about open-source DBM capabilities in this area.

    • One or more read-only threads/processes/servers. These would be spawned on demand and service individual or sequences of related queries and return the results. They would communicated with the calling BLU directly (via sockets) and die when the calling BLU dropped the link.

      The logic here is that this could be a single parent process spawning child threads/processes in a single box or a single parent process on one box handing of to child processes running on several boxes with replicated DB's and background update. Again, the better (read expensive) commercial DB's can handle this internally, but I am unsure of the progress in this area for open-source DBM's.

      Information on the latter would be appreciated.

Any and all critiques, comments, caveats or congrats gratefully received and voraciously devoured.

Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring!