Re: Re: STOP Trading Memory for Speed

in reply to Re: STOP Trading Memory for Speed
in thread STOP Trading Memory for Speed

You're missing some rather fundamental points here. On some CPU architectures you can see stall times of up to 200 cycles when accessing memory that isn't in L1 or L2 cache, and with large data sets this can be a significant overhead. Significant enough that in some cases it's actually faster to have a more compact representation in memory (and thus a higher density of real data in cache) with extra code to decode that data than it is to have simple code with relatively loosely packed data.

For example, while it's much more efficient on most RISC architectures to use 32 bit or 64 bit data elements over 8 bit ones (as you generally have a mask and shift operation internally, or often externally), if using only 8 bit data means a larger portion of your data set fits in cache with fewer spills to main memory (or even L2 cache) the increased number of cycles actually used to decode your data is smaller than the number of cycles you pause waiting on memory busses.

Comment on Re: Re: STOP Trading Memory for Speed

Replies are listed 'Best First'.
Re^3: STOP Trading Memory for Speed by Aristotle (Chancellor) on Sep 26, 2002 at 05:40 UTC
I've done a fair bit of assembly programming and am perfectly aware of issues of cache locality, but that issue alone doesn't make his example any more relevant. There's also the opposite case where the inner loops don't fit into L1, which turns out an order of magnitude more costly than fetching data from the slower memory tiers. It's harder to hit that limit of course, but as most CPUs have much smaller caches for code than data it's a real possibility nevertheless, considering the appeal is that we should routinely trade speed for data size. It's rarely quite as easy as "stop trading memory for speed!" or "lookup tables make things faster". Top performance, if at all needed in the first place, cannot be achieved any other way than by benchmarking using a low-level profiler to find out exactly where the bottlenecks are. And several bottlenecks usually interdepend so you often have to be careful not to make an "improvement" that actually worsens the overall situation. Makeshifts last the longest.	[reply]

Replies are listed 'Best First'.

Re^3: STOP Trading Memory for Speed
by Aristotle (Chancellor) on Sep 26, 2002 at 05:40 UTC

I've done a fair bit of assembly programming and am perfectly aware of issues of cache locality, but that issue alone doesn't make his example any more relevant.

There's also the opposite case where the inner loops don't fit into L1, which turns out an order of magnitude more costly than fetching data from the slower memory tiers. It's harder to hit that limit of course, but as most CPUs have much smaller caches for code than data it's a real possibility nevertheless, considering the appeal is that we should routinely trade speed for data size.

It's rarely quite as easy as "stop trading memory for speed!" or "lookup tables make things faster". Top performance, if at all needed in the first place, cannot be achieved any other way than by benchmarking using a low-level profiler to find out exactly where the bottlenecks are. And several bottlenecks usually interdepend so you often have to be careful not to make an "improvement" that actually worsens the overall situation.

Makeshifts last the longest.

[reply]

In Section Meditations