Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Cachereplacementpolicies, Study notes of Advanced Computer Architecture

University of Northampton Advanced Computer Architecture

Cachereplacementpolicies

Typology: Study notes

2015/2016

Uploaded on 09/11/2016

abhineet_bhojak 🇬🇧

(1)

1 document

1 / 13

This page cannot be seen from the preview

Don't miss anything!

ECE/CS 752:Advanced Computer Architecture I 1

Cache Replacement Policies

Prof. Mikko H. Lipasti

University of Wisconsin-Madison

ECE/CS 752 Spring 2012

Cache Design: Four Key Issues

These are:

–Placement

Where can a block of memory go?

–Identification

How do I find a block of memory?

–Replacement

How do I make space for new blocks?

–Write Policy

How do I propagate changes?

Consider these for caches

–Usually SRAM

Also apply to main memory, disks

Discover Study notes of Advanced Computer Architecture University of Northampton

Partial preview of the text

Download Cachereplacementpolicies and more Study notes Advanced Computer Architecture in PDF only on Docsity!

Cache Replacement Policies

Prof. Mikko H. Lipasti

University of Wisconsin-Madison

ECE/CS 752 Spring 2012 2

Cache Design: Four Key Issues

 These are:

Placement  Where can a block of memory go?
Identification  How do I find a block of memory?
Replacement  How do I make space for new blocks?
Write Policy  How do I propagate changes?

 Consider these for caches

Usually SRAM

 Also apply to main memory, disks

Placement

Memory

Type

Placement Comments

Registers Anywhere;

Int, FP, SPR

Compiler/programmer

manages

Cache

(SRAM)

Fixed in H/W Direct-mapped,

set-associative,

fully-associative

DRAM Anywhere O/S manages

Disk Anywhere O/S manages

Placement

 Address Range

Exceeds cache capacity

 Map address to finite capacity

Called a hash
Usually just masks high-order bits

 Direct-mapped

Block can only exist in one location
Hash collisions cause problems SRAM Cache Hash Address Index Data Out Index Offset 32 - bit Address Offset Block Size

Replacement

 Cache has finite size

What do we do when it is full?

 Analogy: desktop full?

Move books to bookshelf to make room
Bookshelf full? Move least-used to library
Etc.

 Same idea:

Move blocks to next level of cache 8

Cache Miss Rates: 3 C’s [Hill]

 Compulsory miss or Cold miss

First-ever reference to a given block of memory
Measure: number of misses in an infinite cache model  Capacity
Working set exceeds cache capacity
Useful blocks (with future references) displaced
Good replacement policy is crucial!
Measure: additional misses in a fully-associative cache  Conflict
Placement restrictions (not fully-associative) cause useful blocks to be displaced
Think of as capacity within set
Good replacement policy is crucial!
Measure: additional misses in cache of interest

Replacement

 How do we choose victim?

Verbs: Victimize, evict, replace, cast out

 Many policies are possible

FIFO (first-in-first-out)
LRU (least recently used), pseudo-LRU
LFU (least frequently used)
NMRU (not most recently used)
NRU
Pseudo-random (yes, really!)
Optimal
Etc 10

Optimal Replacement Policy?

[Belady, IBM Systems Journal, 1966]

 Evict block with longest reuse distance

i.e. next reference to block is farthest in

future

Requires knowledge of the future!

 Can’t build it, but can model it with trace

Process trace in reverse
[Sugumar&Abraham] describe how to do this in

one pass over the trace with some lookahead

(Cheetah simulator)

 Useful, since it reveals opportunity

(X,A,B,C,D,X): LRU 4-way SA $, 2nd^ X will miss

Practical Pseudo-LRU In Action

13 J F C B X Y A Z

J Y X Z BC F A

011: PLRU

Block B is here 110: MRU block is here Z < A Y < X B < C J < F A > X C < F A > F B C F A J Y X Z Partial Order Encoded in Tree:

Practical Pseudo-LRU

 Binary tree encodes PLRU partial order

At each level point to LRU half of subtree

 Each access: flip nodes along path to block

 Eviction: follow LRU path

 Overhead: (a-1)/a bits per block 14

J F C B X Y A Z 011: PLRU Block B is here 110: MRU block is here Older Newer

Refs: J,Y,X,Z,B,C,F,A

True LRU Shortcomings

 Streaming data/scans: x 0 , x 1 , …, xn

Effectively no temporal reuse

 Thrashing: reuse distance > a

Temporal reuse exists but LRU fails

 All blocks march from MRU to LRU

Other conflicting blocks are pushed out

 For n>a no blocks remain after

scan/thrash

Incur many conflict misses after scan ends

 Pseudo-LRU sometimes helps a little bit

Segmented or Protected LRU

[I/O: Karedla, Love, Wherry, IEEE Computer 27(3), 1994] [Cache: Wilkerson, Wade, US Patent 6393525, 1999]

 Partition LRU list into filter and reuse lists

 On insert, block goes into filter list

 On reuse (hit), block promoted into reuse list

 Provides scan & some thrash resistance

Blocks without reuse get evicted quickly
Blocks with reuse are protected from scan/thrash blocks

 No storage overhead, but LRU update slightly

more complicated

RRIP [Jaleel et al. ISCA 2010]

 Re-reference Interval Prediction

 Extends NRU to multiple bits

Start in the middle, promote on hit,

demote over time

 Can predict near-immediate ,

intermediate , and distant re-reference

 Low overhead: 2 bits/block

 Static and dynamic variants (like

LIP/DIP)

Set dueling (^19)

Least Frequently Used

 Counter per block, incremented on

reference

 Evictions choose lowest count

Logic not trivial ( a^2 comparison/sort)

 Storage overhead

1 bit per block: same as NRU
How many bits are helpful? 20

Pitfall: Cache Filtering Effect

 Upper level caches (L1, L2) hide reference

stream from lower level caches

 Blocks with “no reuse” @ LLC could be very hot

(never evicted from L1/L2)

 Evicting from LLC often causes L1/L2 eviction

(due to inclusion)

 Could hurt performance even if LLC miss rate

improves

Cache Replacement Championship

 Held at ISCA 2010

 http://www.jilp.org/jwac- 1

 Several variants, improvements

 Simulation infrastructure

Implementations for all entries 22

References

W. Lin et al. “Predicting last-touch references under optimal replacement.” Technical Report CSE-TR- 447 - 02, U. of Michigan, 2002. H. Liu et al. “Cache Bursts: A New Approach for Eliminating Dead Blocks and Increasing Cache Efficiency.” In Micro-41, 2008. G. Loh. “Extending the Effectiveness of 3D-Stacked DRAM Caches with an Adaptive Multi-Queue Policy”. In Micro, 2009. C.-K. Luk et al. Pin: building customized program analysis tools with dynamic instrumentation. In PLDI, pages 190–200, 2005. N. Megiddo and D. S. Modha, “ARC: A self-tuning, low overhead replacement cache,” in FAST, 2003. E. J. O’Neil et al. “The LRU-K page replacement algorithm for database disk buffering,” in Proc. ACM SIGMOD Conf., pp. 297–306, 1993. M. Qureshi, A. Jaleel, Y. Patt, S. Steely, J. Emer. “Adaptive Insertion Policies for High Performance Caching”. In ISCA-34, 2007. K. Rajan and G. Ramaswamy. “Emulating Optimal Replacement with a Shepherd Cache”. In Micro-40, 2007. J. T. Robinson and M. V. Devarakonda, “Data cache management using frequency-based replacement,” in SIGMETRICS Conf, 1990. R. Sugumar and S. Abraham, “Efficient simulation of caches under optimal replacement with applications to miss characterization,” in SIGMETRICS, 1993. Y. Xie, G. Loh. “PIPP: Promotion/Insertion Pseudo-Partitioning of Multi-Core Shared Caches.” In ISCA-36, 2009 Y. Zhou and J. F. Philbin, “The multi-queue replacement algorithm for second level buffer caches,” in USENIX Annual Tech. Conf, 2001. 25

Cachereplacementpolicies, Study notes of Advanced Computer Architecture

Related documents

Partial preview of the text

Download Cachereplacementpolicies and more Study notes Advanced Computer Architecture in PDF only on Docsity!

Cache Replacement Policies

Prof. Mikko H. Lipasti

University of Wisconsin-Madison

Cache Design: Four Key Issues

 These are:

 Consider these for caches

 Also apply to main memory, disks

Placement

Memory

Type

Placement Comments

Registers Anywhere;

Int, FP, SPR

Compiler/programmer

manages

Cache

(SRAM)

Fixed in H/W Direct-mapped,

set-associative,

fully-associative

DRAM Anywhere O/S manages

Disk Anywhere O/S manages

Placement

 Address Range

 Map address to finite capacity

 Direct-mapped

Replacement

 Cache has finite size

 Analogy: desktop full?

 Same idea:

Cache Miss Rates: 3 C’s [Hill]

Replacement

 How do we choose victim?

 Many policies are possible

Optimal Replacement Policy?

[Belady, IBM Systems Journal, 1966]

 Evict block with longest reuse distance

future

 Can’t build it, but can model it with trace

one pass over the trace with some lookahead

(Cheetah simulator)

 Useful, since it reveals opportunity

Practical Pseudo-LRU In Action

J Y X Z BC F A

011: PLRU

Practical Pseudo-LRU

 Binary tree encodes PLRU partial order

 Each access: flip nodes along path to block

 Eviction: follow LRU path

 Overhead: (a-1)/a bits per block 14

Refs: J,Y,X,Z,B,C,F,A

True LRU Shortcomings

 Streaming data/scans: x 0 , x 1 , …, xn

 Thrashing: reuse distance > a

 All blocks march from MRU to LRU

 For n>a no blocks remain after

scan/thrash

 Pseudo-LRU sometimes helps a little bit

Segmented or Protected LRU

 Partition LRU list into filter and reuse lists

 On insert, block goes into filter list

 On reuse (hit), block promoted into reuse list

 Provides scan & some thrash resistance

 No storage overhead, but LRU update slightly

more complicated

RRIP [Jaleel et al. ISCA 2010]

 Re-reference Interval Prediction

 Extends NRU to multiple bits

demote over time

 Can predict near-immediate ,

intermediate , and distant re-reference

 Low overhead: 2 bits/block

 Static and dynamic variants (like

LIP/DIP)

Least Frequently Used

 Counter per block, incremented on