|Pathologically Eclectic Rubbish Lister|
Re: Re: "The First Rule of Distributed Objects is..."by perrin (Chancellor)
|on Oct 21, 2003 at 20:35 UTC||Need Help??|
Imagine a site that keeps it simple and they need to the fast parts to stay fast. Well, things that take up more speed need more resources. Yes, throwing more hardware, complete webservers, does solve something, but in the long run, the entire site will slow down again as it gets more popular.
I'm not sure what your point is here. Yes, scaling typically involves adding hardware as well as tuning your code. What does this have to do with physical tiers vs. logical tiers?
So fine, now you create a seperate farm for one part of your site. Tons of links to change over etc etc..
First, you only separate things if they are so uneven in terms of the resources they require that normal load-balancing (treating all requests for dynamic pages equally) doesn't work. This is very rare. Second, you NEVER change links! There is no reason to change your outside URLs just because the internal ones changed. If you want to change where you route the requests internally, use your load-balancer or mod_rewrite/mod_proxy to do it.
So why not keep the web interface running uber fast, the actual display layer, but move the bottle necks to a seperate area.
How does this help anything? The amount of work is still the same, except you have now added some extra RPC overhead.
The advantage is, that if I hit one interface that is dog slow, it may not mean the others will slow down. That might mean tweaking the aplication layer or db layer.
Can you explain what you're talking about here? Are you saying that some of your requests will not actually need to use the db layer or application layer?
You keep everything on high speed connections, and things will run well. You and i both know that sending 1k of data over RJ45 is very quick to move that type of stuff over.
Running well is a relative term. They will not run anywhere near as fast as they would without the RPC overhead. I'm not making this stuff up; this is coming from real applications in multiple languages that I have profiled over the years. Socket communication is fast, but it's much slower than most of the other things we do in a program that don't require inter-process communication.
And we all know you don't do repeditive rpc calls to get all your data.. .just one fell-swoop with one biz objects.
And that's the other problem: the RPC overhead forces you to replace a clean OO interface of fine-grained methods with "one fell-swoop..." This is one of Fowler's biggest complaints about distrubuted designs.
Also, doing things via proxy server would be fast, but as the scenario gets more complex with a url proxy, things slow down in a way you can't speed back up.
I'm not sure where you're getting this from. mod_rewrite (the most common choice for doing things based on URL) is very fast, and the hardware load-balancers are very very fast.
Load balancing a bunch of proxy servers is just ugly. 'cause then you have something that is just as bad as a regualr web farm. As things slow down, the entire service slows down and not just the parts that are costly.
What's so bad about a web farm? Every large site requires multiple web servers. And which parts are costly?
I think you are imagining that it would take less hardware to achieve the same level of service if you could restrict what is running on each box so that some only run "business objects" while others run templating and form parsing code. I don't see any evidence to support this idea. If the load is distributed pretty evenly among the machines, I would expect the same amount of throughput, except that with the separate physical tiers you are adding additional load in the form of RPC overhead.
Think of it like this: you have a request that takes 10 resource units to handle -- 2 for the display and 8 for the business logic. You have 2 servers that can each do 500 resource units per second for a total of 1000. If you don't split the display and business logic across multiple servers and you distribute the load evenly across them, you will be able to handle 100 requests per second on these servers. If you split things up so that one server handles display and the other handles business logic, you will max out the business logic one at 62 requests per second (496 units on that one box). So you buy another server to add to your business logic pool and now you can handle 125 requests per second, but your display logic box is only half utilized, and if you had left these all together and split them evenly across three boxes you could have been handling 150 at this point. And this doesn't even take the RPC overhead into account!
Distributed objects sound really cool, and there is a place for remote cross-language protocols like SOAP and XML-RPC, but the scalability argument is a red herring kept alive by vendors.