Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Re: Re: Re: Re: Re: "The First Rule of Distributed Objects is..."

by perrin (Chancellor)
on Oct 22, 2003 at 20:37 UTC ( #301358=note: print w/ replies, xml ) Need Help??


in reply to Re: Re: Re: Re: Re: "The First Rule of Distributed Objects is..."
in thread Multi tiered web applications in Perl

Thanks for making me think about this some more. Let's summarize things a bit.

I think that adding one ore more RPC calls to each request will add significant overhead. You think it will get lost in the noise. We probably won't agree on that, and I only have anecdotal evidence to prove my point, so agree to disagree.

I think that forcing the communication between the presentation logic, domain objects, and database layer to be done with coarse-grained calls is a problem. You don't think it matters. Fowler talks about this in his recent book, and that chapter is mostly republished in this article (which requires a free registration). Here's a relevant excerpt:

A local interface is best as a fine-grained interface. Thus, if I have an address class, a good interface will have separate methods for getting the city, getting the state, setting the city, setting the state and so forth. A fine-grained interface is good because it follows the general OO principle of lots of little pieces that can be combined and overridden in various ways to extend the design into the future.

A fine-grained interface doesnít work well when itís remote. When method calls are slow, you want to obtain or update the city, state and zip in one call rather than three. The resulting interface is coarse-grained, designed not for flexibility and extendibility but for minimizing calls. Here youíll see an interface along the lines of get-address details and update-address details. Itís much more awkward to program to, but for performance, you need to have it.

I would point out that with a fine-grained interface you could throw an error when someone passes in a bad zip code, while a coarse-grained one would necessitate gathering up all the errors from all the input, putting them in some kind of structure to pass back, and then making the client code go through the structure and respond to each issue. It just isn't as fluid. But we will probably not agree on this either. I do recognize that there are situations where everything can be summed up in a single call, but I don't think all communications between layers fit well into that.

Finally, you seem to see the primary value of a distributed architecture as the ability to isolate individual sections of the app. You are talking about fairly large sections, like entire pages, so I think this is separate from the original question of whether or not the presentation layer and application layer should be on the same machine. I agree that there are uses for this, but I still think they only apply when you are willing to let a certain section of your application perform badly as long as another section performs well. I don't see how your statement that "some parts will require resources beyond that of others" applies to this. Of course they will, and at that point you can either improve your overall capacity to handle it, or isolate the part that is performing badly and let it continue to perform badly while the rest of the application is fast.

I'll give an example of a use for this. Say you have an e-commerce site that has a feature to track a customer's packages. This involves a query to an external company's web site. It's slow, and there is nothing you can do about it since that resource is out of your control. Letting all of your servers handle requests for this could result in tying up many server processes while they wait for results and could lead to a general slowdown. You could either add resources to the whole site in order to offer the best performance possible for all requests, or you could isolate these package tracking requests on a few machines, making them incredibly slow but allowing the rest of the site (which is making you money) to stay fast. This could be a good compromise, depending on the economics of the situation.

Note that if you then go and add more machines to the slow package tracking cluster to fully handle the load, I would consider the isolation pointless. You could have simply left things all together and added those machines to the general cluster, with the exact same result.

I said this was easy to implement with mod_proxy, and it is, but you correctly pointed out that mod_proxy has significant overhead. There are some other benefits to the mod_proxy approach (caching, serving static files) but for just isolating a particular set of URLs to specific groups of machines you would probably be better off doing it with a hardware load-balancer.


Comment on Re: Re: Re: Re: Re: Re: "The First Rule of Distributed Objects is..."
Re: Re: Re: Re: Re: Re: Re: "The First Rule of Distributed Objects is..."
by exussum0 (Vicar) on Oct 22, 2003 at 23:08 UTC
    Finally, you seem to see the primary value of a distributed architecture as the ability to isolate individual sections of the app. You are talking about fairly large sections, like entire pages, so I think this is separate from the original question of whether or not the presentation layer and application layer should be on the same machine. I agree that there are uses for this, but I still think they only apply when you are willing to let a certain section of your application perform badly as long as another section performs well. I don't see how your statement that "some parts will require resources beyond that of others" applies to this. Of course they will, and at that point you can either improve your overall capacity to handle it, or isolate the part that is performing badly and let it continue to perform badly while the rest of the application is fast. I'll give an example of a use for this. ...
    Well think of it like this. In CS, you can use a divide and conquor type of architecture right? That's how merge/quick sort work. It's also how many other things work, like matrix multiplication. If you can optimize the heavy parts, everything gets quicker. Same reason you use profilers. Point is, by keeping the heavy parts completely isolated from the quicker parts and paying attn to those heavy parts, things will always run fast. If those heavy parts get bogged down again, the quick parts stay quick. That is the big part of keeping everything seperated out, loosly coupled, in one complete architecture. By having things so tight knit, one part CAN slow down the other, and you have to pay attn to the whole.
    Note that if you then go and add more machines to the slow package tracking cluster to fully handle the load, I would consider the isolation pointless. You could have simply left things all together and added those machines to the general cluster, with the exact same result.
    Ah, but measuring need becomes difficult. Adding one machine may making things 5% faster over all.. but if you need that one thing that is slow to become faster, you can improve its speed greatly. But sometiems slower performance doesn't matter so much. Think of say, like reports. No.. not reports. I'm not talking about sophisticated reports. Say.. all messages you've posted to perlmonks. It's ok if it's a little slow since it's a once in a blue moon opperation. It may take a bit of time and resources, but you know what.. that may be ok. And if you want, you can easily redirect stuff by saying what operation goes to who internally, w/o putting up new sysadminny type stuff.
    There are some other benefits to the mod_proxy approach (caching, serving static files) but for just isolating a particular set of URLs to specific groups of machines you would probably be better off doing it with a hardware load-balancer.
    Totally agree on you, but putting some stuff on static pages isn't always an option. And load balancers do solve part of the problem, but not the total problem.

    But you know, it is true. Adding ONE web server to a system that is at 101% capacity solves the problem. The whole splitting things up is great for large systems. Large systems that have large apis.. prolly something you wouldn't do in mod_perl but in more business directed languages, like java or even cobol :)

    Play that funky music white boy..
      If you can optimize the heavy parts, everything gets quicker. Same reason you use profilers.

      I think that's actually a different deal. Optimizing and profiling are about improving the efficiency of the most important pieces of code, reducing the amount of resources needed. Here we're just talking about allocating resources.

      Point is, by keeping the heavy parts completely isolated from the quicker parts and paying attn to those heavy parts, things will always run fast. If those heavy parts get bogged down again, the quick parts stay quick.

      I agree, if you can't afford enough resources to make your whole application run well, you can sacrifice part of its performance by under-allocating resources to it in order to keep another part that you consider more important running quickly. This is a popular idea on big iron systems where you can do things like pin a certain number of CPUs to one job. It doesn't require distributed objects though.

      Totally agree on you, but putting some stuff on static pages isn't always an option. And load balancers do solve part of the problem, but not the total problem.

      I was actually talking about dynamic pages there, not static ones, like you were in your login page and preferences page example. Good load-balancers do solve the problem of isolating specific URLs to run on specific groups of machines, and I wouldn't consider load-balancers or mod_proxy any harder to deal with than J2EE deployment and configuration stuff.

      prolly something you wouldn't do in mod_perl but in more business directed languages, like java or even cobol :)

      Hmmm... Java was created for programming toasters and refrigerators. It's a good general-purpose language, but the whole business slant is just marketing.

        feh, i thought i replied once. Went back, and here we are :)
        I think that's actually a different deal. Optimizing and profiling are about improving the efficiency of the most important pieces of code, reducing the amount of resources needed. Here we're just talking about allocating resources.
        Ah, but allocation of resources in terms of this isn't he simple, "I need 5mb more ram" ordeal. It's about CPU resources. It's about routing requests to more capable systems.
        Good load-balancers do solve the problem of isolating specific URLs to run on specific groups of machines, and I wouldn't consider load-balancers or mod_proxy any harder to deal with than J2EE deployment and configuration stuff.
        Absolutely right, but don't think of it in terms of stateless page requests either. Think about a cohesive application, like AOL. Client makes lots of requests, and you get certain data. So you'd prolly load balance to a certain machine. But that's an arguement for or against stateless connections.

        Truth of the matter is, segmentation on a load balancer may work really well, but a config option which tells an aplication, "the preference changing utility pool is there" and "the forum service pool is here", is a one time persistant thing. With url evaluation, the url has to re-evaluate every time and redirect your traffic. It has a less .. general feeling for me. If it was done on the back end to a connected pool, url gets evaluated once, a message is sent to a ready mady pool, waiting for requests.

        But you are right on with J2EE and mod_proxy in ease of configuration. It's the performance and ability to be quite modular with things without frontal impact. What if you had a front page that hits many resources. You can't split that up w/ mod proxy easily, but your calls can hit multiple pools. Also, it's not about J2EE alone... it's about RPC, and SOAP, and XML-RPC.. it's about distributed objects :)

        Hmmm... Java was created for programming toasters and refrigerators. It's a good general-purpose language, but the whole business slant is just marketing.
        Maybe, maybe not. Let's take ASM. It's a great system language. Same with C. The problem with C and ASM is, that you can't create stuff for easy reuse. Same with Fortran. Great for math, shitty for system programming. Java.. it's really shitty at system programming.. but this isn't an argument in elimination of types. Java has a good way of describing by architecting, business oriented tasks and acting upon that architecture. It's strong on patterns and reuse.
        Play that funky music white boy..

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://301358]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (9)
As of 2014-08-22 23:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (168 votes), past polls