Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^11: Parrot, threads & fears for the future.

by tilly (Archbishop)
on Nov 08, 2006 at 04:07 UTC ( #582778=note: print w/replies, xml ) Need Help??


in reply to Re^10: Parrot, threads & fears for the future.
in thread Parrot, threads & fears for the future.

I had missed this continuation of the thread. (No pun intended.)

About #1, there is no contest. There is a lot of literature about how to set up websites with no single points of failure. For instance you have a pair of load balancers configured for failover. If the primary goes down, the secondary takes over. No single points of failure is more reliable.

Which brings us to Google's map reduce. Suppose you have a job that will run on 2000 machines and takes 5 hours. That's over a year of machine time - the odds that some machine will fail during that job is pretty high. But the odds that the master will fail are very low. And if it does, so what? Google can just re-run the job. It is available after 10 hours, not 5. No big deal. Google is right to not worry.

This is very different than an ecommerce site. Suppose that you've got a business that does $10 million dollars of business a year. If your website is down for an hour, you've just lost about $1000 of business. However traffic varies. Depending on which hour you lost, you're really likely to be out $50 to $20,000. Murphy being Murphy (and in this case Murphy gets a lot of assistance from the fact that flaky hardware tends to fold under load), you're more likely to be out $20,000 than $50. And if you have a single server you're depending on, odds are that you can't produce, configure, and install a replacement in just one hour. So your outage is going to cost a lot more than that.

The result is that your reliability needs depend on what you're doing. Google can't afford to depend on every machine during a big computation, but it can afford to depend on one. An ecommerce site doesn't want to depend on only one machine ever. (Unless that machine is bulletproof.)

And a final huge win with the cluster. If you have a website on a cluster, it is easy to upgrade. Pick a quiet time, take half your webservers out of the rotation, upgrade their code, restart them, swap your upgraded servers with your non-upgraded servers, upgrade the other half, restart them, bring them back online. Voila, an upgrade done without taking your website offline! If you have a single machine you can't do this. Restarting a webserver is fairly slow, particularly if you cache stuff in RAM at startup. (Highly recommended.) Having your weekly upgrade not involve an outage is always a win.

OK, let's move on to #2. A big factor that I think you're missing is that keeping RAM on takes electricity. It probably isn't cost effective for Google to make their reports run faster at the cost of installing that much RAM. You're right that they could do that, but it doesn't make sense for them. However I'm sure it will for others - for instance biotech comes to mind.

And when you talk about AJAX, you've made some big assumptions that are mostly wrong (at least in today's world). Any thread of execution that is doing dynamic stuff takes up lots of resources. Be it memory, database handles, or whatever. As a result standard high performance architectures go to lengths to make the heavyweight dynamic stuff move as fast as possible from client to client. (eg They use reverse proxies so that people on slow modems don't tie up a valuable process.)

Onto #3. I disagree about Perl's main failing. Perl's main failing here is not that Perl doesn't recognize that sometimes you want to be concurrent and sometimes not, it is that there are a lot of operations in Perl that have internal side effects that you wouldn't expect to. For instance my $foo = 4; print $foo; will update $foo when you do the print. Why? Because Perl stringifies the variable, upgrades the scalar to say it can be either a number or a string, then stores the string. There are is so much of this kind of stuff going on behind your back in Perl that it is unreasonable to expect programmers to realize how much they need locks. And attempts to provide the necessary locks behind the programmer's back turned out to be a disaster. (That's why the ithread model was created.)

Perl's duck typing is the problem here. A language like C++ is better for threading not because it is easier to write code whose semantics involve no side effects, but because it is easier in C++ to inspect code and figure out when there will be potential race conditions to worry about. (I'm not saying that C++ is great for writing threaded code, just that it is better than Perl.)

About #4, I wouldn't worry about the practical difficulties. I'm not saying by that that there aren't difficulties - there are. But the database vendors know what they are and are doing their best to produce solutions. (Incidentally I've heard, and believe, that the database that does the best job of running on clusters is actually MySQL. Yes, there is something at which MySQL is technically better than the big boys!)

For the application programmer, it really depends on what your application is. I agree that Google can't just apply the relational database secret sauce and wave their problems goodbye. However for ecommerce, using a database make a lot of sense.

For ecommerce your priorities are remaining up, response time, and throughput. The economics of the situation say that as long as you have sufficiently good response time and throughput, the goal you really need to maximize is uptime. So that is the goal.

Here is a standard architecture. You have dual load balancers (set up for failover), talking to a cluster of machines (with networking set up so that everything fails over smoothly - there are no single points of failure here) and then those machines talk to a relational database. If you're big then you replicate this setup in multiple colocations so that you'll remain up even if a bomb goes off. Congratulations! Using off the shelf solutions, you've now reduced your single points of failure to one (the database) without your developers needing to do anything! Now you have to bulletproof your database, and that's it.

But it gets better. Database vendors are painfully aware that they tend to be a single point of failure, and if you're willing to pay there are plenty of high availability solutions for databases. (Again using mirroring, instant failover etc. Bonus, in some configurations the standby databases can be queried. There is an interruption in service, but it is under a second and only affects the pages that are currently being served.)

The result is that you can pretty much eliminate hardware as a cause of uptime failures by using a standard architecture which involves clusters and relational databases. There, unfortunately, are plenty of other potential causes of uptime failures. But you've gotten rid of a big one.

Replies are listed 'Best First'.
Re^12: Parrot, threads & fears for the future.
by BrowserUk (Pope) on Nov 08, 2006 at 07:20 UTC
    1. . There is a lot of literature about how to set up websites with no single points of failure. For instance you have a pair of load balancers configured for fail-over. If the primary goes down, the secondary takes over. No single points of failure is more reliable.

      Remember that my proposition was that these are tomorrows commodity machines we're taking about. So, rather than todays 16-way cluster to ensure peak-time bandwidth, we only need two, but the cost of these machines is the same, so throw in a third for posterity.

      For your ecommerce site requirements. Instead of running 16-way by 2-core cluster, you run 2-way by 8-core cluster. Your site will have the same headroom in processor power to deal with your peaks. You also have fail-over redundancy.

      Which brings us to Google's map reduce. Suppose you have a job that will run on 2000 machines and takes 5 hours.

      See below.

    2. A big factor that I think you're missing is that keeping RAM on takes electricity.

      I don't believe I have missed that. Memory mapped files do not (have to or usually) exist entirely in RAM, they can and are paged on and off disk. Often through relatively small windows. The benefit the 64-bit address space is simplicity of the mapping which gets messy when mapping a 2 or 4GB address space over 1 (or more) >4GB size files.

      • RAM: 2000 * 4GB -v- 250 * 40 GB.
      • Disks: 4000 * 160 GB -v- 250 * 320 GB.
      • CPUs: 2000 * 2-core -v- 250 * 8-core.
      • Transformers: 2000 -v- 250.
      • Etc.

      RAM uses less power than disks, 1/16 (or greater) reduction in numbers of disks; 1/8 reduction in most other components. That means that even if you keep the same volume of RAM but distribute it into 1/8 th as many machines, you're already saving energy. And the power draw of each generation of RAM chips/modules has either remained static or fallen, whilst the capacity has quadrupled or more with each generation.

      By keeping intermediate results in RAM and performing previously serial processes on it, means that you are also able to more fully utilise the time and processors.

      In your example above. 1 job 2000*5 hours of processing. Same job takes 250*2 hours. 10,000 : 500 == 20:1 saving in time to process, but that's not (just) an efficiency saving. It's also a 20:1 energy saving as you don't have to (continually) run 2000 machines to process the job in a timely manner. You also have quiet-time upgrade ability.

      And when you talk about AJAX, you've made some big assumptions that are mostly wrong (at least in today's world). Any thread of execution that is doing dynamic stuff takes up lots of resources. Be it memory, database handles, or whatever. As a result standard high performance architectures go to lengths to make the heavyweight dynamic stuff move as fast as possible from client to client.

      I think you're wrong on this, but I do not know enough of current practice on AJAX sites to counter you.

    3. I disagree about Perl's main failing. Perl's main failing here is not that Perl doesn't recognize that sometimes you want to be concurrent and sometimes not, it is that there are a lot of operations in Perl that have internal side effects that you wouldn't expect to.

      I opened with "Due to its inherently side-effectful nature, Perl is not the right language for multi-threaded programming." and I don't think I said anything that later contradict that?

    4. I wouldn't worry about the practical difficulties.

      I really wasn't.

      My point was that without the " the hidden semantics", the parallelisation of DB operations is a natural big win, but with those semantics it's much less so. Whilst DB operations remain an out-of-box, cross-network, many-to-one communications affair, the communications overheads remain constant and contention dominates.

      Once you have the processor(s), address space and memory to move the DB into the same box as the application programs, the communications overheads disappear. Instead of the DBM process sharing it's limited address space between the demands of multiple callers, the common code runs in the address space of the caller.

      With 64-bit address space, instead of storing your tables and indexes in myriad separate files, each database becomes a single huge file mapped into memory on demand, and cached there. Disk/file locks become memory locks.

      Think of it like this. Just as virtual addressing is used to provide swapping for processes now, so it gets used to provide shared memory access to your databases. All the logic remains the same. Locking, caching, the works. The difference is that it all happens at RAM speed rather than disk speed, and you lost all the communications overhead and the serialisation of results through (relatively) low-speed, high latency sockets and the need for DB handles simply disappears.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://582778]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2021-09-27 05:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?