Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re: Re: Out-Of-Date Optimizations? New Idioms? RAM vs. CPU

by liz (Monsignor)
on Jul 21, 2003 at 07:46 UTC ( #276170=note: print w/ replies, xml ) Need Help??

in reply to Re: Out-Of-Date Optimizations? New Idioms? RAM vs. CPU
in thread Out-Of-Date Optimizations? New Idioms? RAM vs. CPU

I think of it this way:

  • When the first PC's came along, you would try to keep as much in memory as possible because disk access was so very slow compared to the CPU. (I once rewrote a Clipper (Dbase3) program that took > 30 minutes in another language in which I could just use RAM, and it went to 15 seconds ;-).
  • Lately, CPU's have become much faster. So much faster that RAM (other than the CPU L1 and L2 caches) has become very slow compared to the CPU. So now you should be trying to keep everything in the CPU caches.
  • One way to achieve this is to not keep temporary values in memory, but calculate them again and again (as long as it stays in the L1 and L2 cache).
Probably not a technically correct view of what's happening, but a model that I'm working with. I'm open to anyone correcting this model.


P.S.: Yes, my background is experimental physics, sometime long ago.

Comment on Re: Re: Out-Of-Date Optimizations? New Idioms? RAM vs. CPU
Re: Out-Of-Date Optimizations? New Idioms? RAM vs. CPU
by Abigail-II (Bishop) on Jul 21, 2003 at 08:04 UTC
    If you recalculate something, so that the something doesn't stay in memory, it won't stay in the cache either. The cache is a memory cache - what's there is also in the main memory.

    CPU's have become faster, but main memories have become bigger. Nowadays, computers tend not to swap; if your server swaps on a regular basis, you might want to do some tuning. Memory I/O is faster than disk I/0, and the ratio memory I/0 / disk I/0 is more than the ratio cache / memory. I/0.

    Maybe not much of a data point, but from the servers with resource problems I've seen, more of them benefitted from getting more memory, than more or faster CPUs. Most computers have more than enough CPU cycles - but usually they can use more main memory.


      Could you recommend any good books/articles/etc on system performance tuning? I've read 'system performance tuning' and 'web performance tuning' from O'Reilly but didn't find them all that useful. Thanks.

        I've seen of useful, or useful looking books about Solaris system performance tuning - most of them printed by SUN. I've found the course material of HP Education quite useful. System performance tuning is hard, and system specific. I would distrust any book that claims to discuss system performance tuning without focussing on a specific OS. I'd also distrust any "cookbook".

        Go for something that spends a lot of time analysing performance problems, that discusses performance measuring tools (like glance for HP-UX), and goes into detail explaining how the system works.


      True, but nowadays I think of swap as something to keep a computer from crashing during peak loads, rather then something you would need during "normal" operations. If your computer needs swap for "normal" operations (other than as an optimalization), then you have a problem. And indeed, then it doesn't matter because you have bigger problems.

      But I meant more the case when everything can fit in RAM, and you want to make it still faster.


Re: Re: Re: Out-Of-Date Optimizations? New Idioms? RAM vs. CPU
by tilly (Archbishop) on Jul 21, 2003 at 16:02 UTC
    A better way to improve usage of cache without going through a lot of careful tuning is to keep actively accessed data together, and avoid touching lots of memory randomly.

    My understanding (from my view somewhere in the bleachers) is that Parrot's garbage collection will provide both benefits.

    Incidentally correcting a point you made in your original post, the importance of Parrot having lots of registers is not to make efficient use of cache. It is to avoid spending half of the time on stack operations (estimate quoted from my memory of elian's statement about what JVM and .NET do). In a register-poor environment, like x86, you come out even. In a register-rich environment you win big. (Yes, I know that x86 has lots of registers - but most are not visible to the programmer and the CPU doesn't always figure out how to use them well on the fly.)

    Before someone pipes up and says that we should focus on x86, Parrot is hoping to survive well into the time when 32-bit computing is replaced by 64-bit for mass consumers. Both Intel and AMD have come out with 64-bit chips with far more registers available to the programmer than x86 has. That strongly suggests that the future of consumer computing will have lots of registers available. (Not a guarantee though, the way that I read the tea leaves is that Intel is hoping that addressing hacks like PAE will allow 32-bit computing to continue to dominate consumer desktops through the end of the decade. AMD wants us to switch earlier. I will be very interested to see which way game developers jump when their games start needing more then 2GB of RAM.)

      And a good way to ruin cache hits is to use a garbage-collecting language.

      In reply to Aristotle:
      The theory is (and I haven't profiled this myself, just passing on received wisdom) is that when the GC goes off to clear out old memory, it has to read it into the cache to do so. If the memory were released as soon as it were finished with, then the page could just be discarded as necessary. Of course, the effect on processor cache is just one factor, and it may be that good GC systems can make up for this in other ways, but I don't like them anyway. I much prefer deterministic release of resources.
      I first heard of this theory from comments by Linus Torvalds, if you'll excuse the name dropping, and it seems to make sense to me. Of course it may be that the pages visited by the GC are pages that are going to be needed real soon. A good reminder that the first rule of optimisation is: don't, and the second is: do some profiling first.

        Any backup for your claim? All arguments I’ve heard so far indicate the opposite – which I’d be inclined to believe, unless you’re allocating and releasing memory in a tight loop. (But that would cause thrashing regardless of a garbage collector anyway…) So what would support a claim to the opposite?

        Makeshifts last the longest.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://276170]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (13)
As of 2014-07-31 14:07 GMT
Find Nodes?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:

    Results (249 votes), past polls