Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^3: Parrot, threads & fears for the future.

by tilly (Archbishop)
on Oct 23, 2006 at 19:45 UTC ( #580139=note: print w/replies, xml ) Need Help??


in reply to Re^2: Parrot, threads & fears for the future.
in thread Parrot, threads & fears for the future.

Sorry, but that's just silly.

It is a basic economic fact that price per performance for commodity hardware is far, far cheaper than for big servers. Clusters are a way for businesses to take advantage of this to get the performance and reliability they want at a much better price point.

That 64-bit versus 32-bit is irrelevant can be trivially demonstrated. Big 64-bit servers are old news, the big Unix vendors went through that transition a decade ago. (I don't know when IBM's mainframes went through it, but I think it was earlier than that.) Yet in the last decade big iron not only did not replace clusters, but they actually lost ground to them. Why? Because clusters are a lot cheaper.

Now I'm not denying that big machines offer performance advantages over clusters. You have correctly identified some of those advantages. And I grant that there are plenty of problems that can only be done on a big machine. If you have one of those problems, then you absolutely must swallow the pricetag and buy big iron. But if you can get away with it, you're strongly advised to get a cluster.

Most problems do not have to run on a huge machine. Clusters are far cheaper than equivalent performance on a big machine. Neither fact seems likely to change in the forseeable future. As long as they remain true, clusters are going to remain with us.

  • Comment on Re^3: Parrot, threads & fears for the future.

Replies are listed 'Best First'.
Re^4: Parrot, threads & fears for the future.
by BrowserUk (Pope) on Oct 23, 2006 at 20:20 UTC

    This years Big iron, is next years commodity hardware. This year commodity hardware is 32-bit, dual processor. Next years it will be 64-bit 2 core. The year after that, 64-bit 2 core hyperthreaded (4 cpus). The year after that...

    I admit, each of those 'years' is really a Moore's cycle. But basically, commodity hardware has fallen in price (10 to 30%) and doubled in performance at each Moore's cycle for the last few. Speed gains through decreasing die size and uping clock speeds are hitting the limits of silicon, ion beam frequency and mask resolution. For the first time in the PCs history the next cycles increase in performance will come from multi-core, multi-cpu machines.

    Intel and AMD are both talking about moving to quad core processors in 2007.

    There are already 8-way motherboards available.

    Put those together and you get a 32-way cluster in a box. If each of Intel's Quad cores is also hyperthreaded, 64 cpus in a box. What about AMD's Hypertransport and XBAR? A small, 1 Gigabit, switched network on a chip? Or IBMs Cell Architecture. If they are cheap enough to put into games machines, how long before 2 or 4 of them turn up in a PC?

    Sure, they will not be commodity priced next year, but what of the year after?

    The future is threaded ;)


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      How about we have a bet on whether clusters are going away?

      I'll bet you that in 2010, people will still be building websites on clusters of commodity hardware, and there will still be a healthy market in load balancers. Furthermore I'll bet you that over 10% of the top 500 supercomputers are clusters. And finally I'll bet you that most Perl programmers won't be writing multi-threaded code. If any of those statements are wrong, you win the bet.

      Now it is true that several trends point to commodity PCs having many CPUs. However it is also true that commodity PCs tend to have many programs running on them at any given time. It is further true that for most programming problems there is an embarrassment of excess when it comes to CPU power.

      Furthermore there are lots of business problems where commodity hardware won't cut it. And nobody is about to change the fact that in that case, the cheapest way to scale is to a cluster of commodity machines.

      And another big argument against multi-threading is that it is hard to do. We have enough trouble finding people who can program semi-competently. Competently programming a multi-threaded program is harder than competently programming a single-threaded one. So even if there is a desire for more multi-threaded programming, we're not going to succeed at it until we find far better approaches.

      A final note. Computing did not begin or end with the PC. We have many kinds of computers around us, and we're going to have more. While PCs evolve into something more like a supercomputer, people are programming their cell phones, PDAs, and a host of other mobile devices. These devices have far more modest performance requirements than PCs do.

      In summary, the future holds every kind of computing we know about, and a lot of kinds that we don't.

        How about we have a bet on whether clusters are going away?

        I didn't say or imply that "clusters were going away".

        Only that the bar at which clusters need to be resorted to will be raised. That those currently having to use the smaller (4- to 32-way) clusters to get their work done will soon no longer need to deal with the latency, restricted bandwidth and topological problems involved with clusters, nor the complexity and expense of cluster-based software solutions, because they'll be able to use simpler, cheaper, cluster in a box solutions.

        Google, arguably the biggest users of clusters, pretty certainly commercially, also use commodity hardware. Will google still be using clusters in 2010? Of course. But what say you that:

        • Instead of their clusters averaging 2000 commodity PCs, they use 256 commodity multi-cpu machines?
        • And that by making that transition, the "locality optimisation" that employ in their clusters to conserve bandwidth gets the huge boost that most data that is read and written by each of their cluster workers is not just on the local harddisk, but 'local' in memory?
        • And the current chunk size of 64MB used by their GFS and their chunk servers becomes (say) 256MB or bigger.
        • And their throughput grows accordingly because of the reduction in the frequency that data needs to be transported between machines.

        Will google move to using threads? Consider the possibilities.

        For each given MapReduce task, they currently deploy M map tasks and R reduce tasks (where R is usually some multiple of M), that each live on different machine within a (~2000) machine cluster. The intermediate outputs from the M map tasks, are written/replicated to the local disks of two or three chunk servers within the same cluster. Each of reduce tasks then reads this intermediate results from one or other of those chunk servers, processes it and writes/replicates it's results two or three other chunk servers.

        Now, imagine if each group of 1 map task + N reduce tasks all ran within the same machine? Instead of each piece of intermediate data making 6 network transports, those reads and writes can benefit from the localisation optimisation that Google already use. That reduces bandwidth consumption immediately. And by quite a large factor.

        Now further imagine that instead of 1 Map task and N Reduce tasks per cluster reading and writing to the local hard disk. You instead deploy 1 Map thread and N Reduce threads per cluster. Now, there is no need for the intermediate data to leave ram.

        You've gone from 6 cross network transfers for each piece of intermediate data, to 1 read and 1 write from local memory. How would that affect performance?

        And another big argument against multi-threading is that it is hard to do. We have enough trouble finding people who can program semi-competently.

        I really did lose you right at the top of the OP didn't I? Had you read on, you would have realised that about 70% of my post was spent stating the difficulties (in rather more detail), that currently prevent threaded code being written and deployed. It then went on to suggest that there is a solution, but since you're dead set against threading, I won't bore you further by repeating it here.

        A final note. Computing did not begin or end with the PC.

        I'm well aware of that. I've lived and worked through it. My first programs were written to run on a Dec10 running Tops. My college code ran mostly on a PDP11/45. My first database project was on clustered (twinned) pdp11/60s. The first commercial project I independently architected ran on a BBC micro using 6502 machine code. My first interpreted language was REXX running under CMS over VM 370/XA on a an IBM mainframe. Fully half my experience is writing and architecting software that run on machine other than PCs, from embedded systems on microcomputers; to database work on minis; to Big Stuff on Big Iron.

        From e-commerce (when it was still called EDP); through scientific work using images to visualise huge quantities of data; through database work deploying and retrieving literally millions of paper (OMR) university examination entrance & examination papers trans-nationally across the breadth of 6 entire West African countries (3 jumbo jets full of paper in either direction) processing and collating the information into another Jumbo jet full of paper reports in 3 weeks. And much more.

        You (and merlyn) rail on about your respective depths of experience, but from my perspective, based upon the experience that you have outlined here, you both have less years than me; and far narrower band commercial experience. So please, stop trying to 'put me in my place' with your knowledge and depth of experience.

        But just for grins, even the latest supercomputers are PCs. At least in name :)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://580139]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2021-06-16 17:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (76 votes). Check out past polls.

    Notices?