Download Cachereplacementpolicies and more Study notes Advanced Computer Architecture in PDF only on Docsity!
Cache Replacement Policies
Prof. Mikko H. Lipasti
University of Wisconsin-Madison
ECE/CS 752 Spring 2012 2
Cache Design: Four Key Issues
These are:
- Placement Where can a block of memory go?
- Identification How do I find a block of memory?
- Replacement How do I make space for new blocks?
- Write Policy How do I propagate changes?
Consider these for caches
Also apply to main memory, disks
3
Placement
Memory
Type
Placement Comments
Registers Anywhere;
Int, FP, SPR
Compiler/programmer
manages
Cache
(SRAM)
Fixed in H/W Direct-mapped,
set-associative,
fully-associative
DRAM Anywhere O/S manages
Disk Anywhere O/S manages
4
Placement
Address Range
Map address to finite capacity
- Called a hash
- Usually just masks high-order bits
Direct-mapped
- Block can only exist in one location
- Hash collisions cause problems SRAM Cache Hash Address Index Data Out Index Offset 32 - bit Address Offset Block Size
7
Replacement
Cache has finite size
- What do we do when it is full?
Analogy: desktop full?
- Move books to bookshelf to make room
- Bookshelf full? Move least-used to library
- Etc.
Same idea:
- Move blocks to next level of cache 8
Cache Miss Rates: 3 C’s [Hill]
Compulsory miss or Cold miss
- First-ever reference to a given block of memory
- Measure: number of misses in an infinite cache model Capacity
- Working set exceeds cache capacity
- Useful blocks (with future references) displaced
- Good replacement policy is crucial!
- Measure: additional misses in a fully-associative cache Conflict
- Placement restrictions (not fully-associative) cause useful blocks to be displaced
- Think of as capacity within set
- Good replacement policy is crucial!
- Measure: additional misses in cache of interest
9
Replacement
How do we choose victim?
- Verbs: Victimize, evict, replace, cast out
Many policies are possible
- FIFO (first-in-first-out)
- LRU (least recently used), pseudo-LRU
- LFU (least frequently used)
- NMRU (not most recently used)
- NRU
- Pseudo-random (yes, really!)
- Optimal
- Etc 10
Optimal Replacement Policy?
[Belady, IBM Systems Journal, 1966]
Evict block with longest reuse distance
- i.e. next reference to block is farthest in
future
- Requires knowledge of the future!
Can’t build it, but can model it with trace
- Process trace in reverse
- [Sugumar&Abraham] describe how to do this in
one pass over the trace with some lookahead
(Cheetah simulator)
Useful, since it reveals opportunity
- (X,A,B,C,D,X): LRU 4-way SA $, 2nd^ X will miss
Practical Pseudo-LRU In Action
13 J F C B X Y A Z
J Y X Z BC F A
011: PLRU
Block B is here 110: MRU block is here Z < A Y < X B < C J < F A > X C < F A > F B C F A J Y X Z Partial Order Encoded in Tree:
Practical Pseudo-LRU
Binary tree encodes PLRU partial order
- At each level point to LRU half of subtree
Each access: flip nodes along path to block
Eviction: follow LRU path
Overhead: (a-1)/a bits per block 14
J F C B X Y A Z 011: PLRU Block B is here 110: MRU block is here Older Newer
Refs: J,Y,X,Z,B,C,F,A
True LRU Shortcomings
Streaming data/scans: x 0 , x 1 , …, xn
- Effectively no temporal reuse
Thrashing: reuse distance > a
- Temporal reuse exists but LRU fails
All blocks march from MRU to LRU
- Other conflicting blocks are pushed out
For n>a no blocks remain after
scan/thrash
- Incur many conflict misses after scan ends
Pseudo-LRU sometimes helps a little bit
15
Segmented or Protected LRU
[I/O: Karedla, Love, Wherry, IEEE Computer 27(3), 1994] [Cache: Wilkerson, Wade, US Patent 6393525, 1999]
Partition LRU list into filter and reuse lists
On insert, block goes into filter list
On reuse (hit), block promoted into reuse list
Provides scan & some thrash resistance
- Blocks without reuse get evicted quickly
- Blocks with reuse are protected from scan/thrash blocks
No storage overhead, but LRU update slightly
more complicated
16
RRIP [Jaleel et al. ISCA 2010]
Re-reference Interval Prediction
Extends NRU to multiple bits
- Start in the middle, promote on hit,
demote over time
Can predict near-immediate ,
intermediate , and distant re-reference
Low overhead: 2 bits/block
Static and dynamic variants (like
LIP/DIP)
Least Frequently Used
Counter per block, incremented on
reference
Evictions choose lowest count
- Logic not trivial ( a^2 comparison/sort)
Storage overhead
- 1 bit per block: same as NRU
- How many bits are helpful? 20
21
Pitfall: Cache Filtering Effect
Upper level caches (L1, L2) hide reference
stream from lower level caches
Blocks with “no reuse” @ LLC could be very hot
(never evicted from L1/L2)
Evicting from LLC often causes L1/L2 eviction
(due to inclusion)
Could hurt performance even if LLC miss rate
improves
Cache Replacement Championship
Held at ISCA 2010
http://www.jilp.org/jwac- 1
Several variants, improvements
Simulation infrastructure
- Implementations for all entries 22
References
W. Lin et al. “Predicting last-touch references under optimal replacement.” Technical Report CSE-TR- 447 - 02, U. of Michigan, 2002. H. Liu et al. “Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency.” In Micro-41, 2008. G. Loh. “Extending the Effectiveness of 3D-Stacked DRAM Caches with an Adaptive Multi-Queue Policy”. In Micro, 2009. C.-K. Luk et al. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI, pages 190–200, 2005. N. Megiddo and D. S. Modha, “ARC: A self-tuning, low overhead replacement cache,” in FAST, 2003. E. J. O’Neil et al. “The LRU-K page replacement algorithm for database disk buffering,” in Proc. ACM SIGMOD Conf., pp. 297–306, 1993. M. Qureshi, A. Jaleel, Y. Patt, S. Steely, J. Emer. “Adaptive Insertion Policies for High Performance Caching”. In ISCA-34, 2007. K. Rajan and G. Ramaswamy. “Emulating Optimal Replacement with a Shepherd Cache”. In Micro-40, 2007. J. T. Robinson and M. V. Devarakonda, “Data cache management using frequency-based replacement,” in SIGMETRICS Conf, 1990. R. Sugumar and S. Abraham, “Efficient simulation of caches under optimal replacement with applications to miss characterization,” in SIGMETRICS, 1993. Y. Xie, G. Loh. “PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches.” In ISCA-36, 2009 Y. Zhou and J. F. Philbin, “The multi-queue replacement algorithm for second level buffer caches,” in USENIX Annual Tech. Conf, 2001. 25