Memory Hierarchy Design
1. How to evaluate Cache Performance. Explain various cache optimization categories.
The average memory access time is calculated as follows
Average memory access time = hit time + Miss rate x Miss Penalty.
Where Hit Time is the time to deliver a block in the cache to the processor (includes time
to determine whether the block is in the cache), Miss Rate is the fraction of memory
references not found in cache (misses/references) and Miss Penalty is the additional time
required because of a miss
The average memory access time due to cache misses predicts processor performance.
First, there are other reasons for stalls, such as contention due to I/O devices using
memory and due to cache misses
Second, The CPU stalls during misses, and the memory stall time is strongly correlated
to average memory access time.
CPU time = (CPU execution clock cycles + Memory stall clock cycles) × Clock cycle
There are 17 cache optimizations into four categories:
1 Reducing the miss penalty: multilevel caches, critical word first, read miss before
write miss, merging write buffers, victim caches;
2 Reducing the miss rate larger block size, larger cache size, higher associativity,
pseudo-associativity, and compiler optimizations;
3 Reducing the miss penalty or miss rate via parallelism: nonblocking caches,
hardware prefetching, and compiler prefetching;
4 Reducing the time to hit in the cache: small and simple caches, avoiding address
translation, and pipelined cache access.
2. Explain various techniques for Reducing Cache Miss Penalty
There are five optimizations techniques to reduce miss penalty.
i) First Miss Penalty Reduction Technique: Multi-Level Caches
The First Miss Penalty Reduction Technique follows the Adding another
level of cache between the original cache and memory. The first-level cache can be small
enough to match the clock cycle time of the fast CPU and the second-level cache can be
large enough to capture many accesses that would go to main memory, thereby the
effective miss penalty.
The definition of average memory access time for a two-level cache. Using the
subscripts L1 and L2 to refer, respectively, to a first-level and a second-level cache, the
Average memory access time = Hit timeL1 + Miss rateL1 × Miss penaltyL1
and Miss penaltyL1 = Hit timeL2 + Miss rateL2 × Miss penaltyL2
so Average memory access time = Hit timeL1 + Miss rateL1× (Hit timeL2 + Miss rateL2 ×
Local miss rate—This rate is simply the number of misses in a cache divided by the total
number of memory accesses to this cache. As you would expect, for the first-level cache
it is equal to Miss rateL1 and for the second-level cache it is Miss rateL2.
Global miss rate—The number of misses in the cache divided by the total num-ber of
memory accesses generated by the CPU. Using the terms above, the global miss rate for
the first-level cache is still just Miss rateL1 but for the second-level cache it is Miss rateL1
× Miss rateL2.
This local miss rate is large for second level caches because the first-level cache skims
the cream of the memory accesses. This is why the global miss rate is the more useful
measure: it indicates what fraction of the memory accesses that leave the CPU go all the
way to memory.
Here is a place where the misses per instruction metric shines. Instead of confusion about
local or global miss rates, we just expand memory stalls per instruction to add the impact
of a second level cache.
Average memory stalls per instruction = Misses per instructionL1× Hit timeL2 + Misses
per instructionL2 × Miss penaltyL2.
we can consider the parameters of second-level caches. The foremost difference
between the two levels is that the speed of the first-level cache affects the clock rate of
the CPU, while the speed of the second-level cache only affects the miss penalty of the
The initial decision is the size of a second-level cache. Since everything in the first-
level cache is likely to be in the second-level cache, the second-level cache should be
much bigger than the first. If second-level caches are just a little bigger, the local miss
rate will be high.
Figures 5.10 and 5.11 show how miss rates and relative execution time change with the
size of a second-level cache for one design.
ii) Second Miss Penalty Reduction Technique: Critical Word First and Early
Multilevel caches require extra hardware to reduce miss penalty, but not this
second technique. It is based on the observation that the CPU normally needs just one
word of the block at a time. This strategy is impatience: Don’t wait for the full block to
be loaded before sending the requested word and restarting the CPU. Here are two
Critical word first—Request the missed word first from memory and send it to the CPU
as soon as it arrives; let the CPU continue execution while filling the rest of the words in
the block. Critical-word-first fetch is also called wrapped fetch and requested word first.
Early restart—Fetch the words in normal order, but as soon as the requested word of the
block arrives, send it to the CPU and let the CPU continue execution.
Generally these techniques only benefit designs with large cache blocks, since the benefit
is low unless blocks are large. The problem is that given spatial locality, there is more
than random chance that the next miss is to the remainder of the block. In such cases, the
effective miss penalty is the time from the miss until the second piece arrives.