High Performance Computing Lecture 28: Cache Organization and Block Placement, Slides of Computer Science

The concepts of cache organization, block placement, and block identification in high performance computing. It covers direct mapping and set associative caching, as well as the 4 qs of cache organization and write policies. The document also touches upon cache replacement policies and memory hierarchy progression.

Typology: Slides

2012/2013

Uploaded on 04/28/2013

dewaan
dewaan 🇮🇳

3.8

(4)

43 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
High Performance Computing
Lecture 28
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download High Performance Computing Lecture 28: Cache Organization and Block Placement and more Slides Computer Science in PDF only on Docsity!

High Performance Computing

Lecture 28

2

Block Placement: Direct Mapping (DM)

 Suppose that the cache is large enough to hold

N blocks from main memory

 i.e., Cache size = N blocks

 Direct Mapping: Memory block M is placed

uniquely in cache block M mod N

Cache

Main Memory

4

Accessing Block in DM Cache

Tag V D

AND

Cache Hit

Data

Tag

18 bits

Index

9 bits

Offset

5 bits

5

Problem with Direct Mapping

 Main memory block M is uniquely mapped to

cache block M mod N

 Consider a program which accesses two

memory locations alternately

 A, B, A, B, A, B, A, B…

 Where both A and B map to the same cache

block

 Example: N = 8, A is in memory block 6 and B is

in memory block 14

 Every access will result in a cache miss

7

Identifying Block in Set Associative

Assume 32 bit address space, 16 KB cache, 32B

cache block size, 4-way set associative cache

Number of Sets = Cache blocks / 4 = 512/4 = 128

Index field: to identify unique cache set

log

2

(128) = 7 bits

Offset field: to identify desired byte in cache block

log

2

(32) = 5 bits

Tag field: to identify which memory block is currently

in this cache block

Tag Index Offset

(remaining 20 bits)

8

Accessing Block (2-way Set Associative)

OR

Cache Hit

Data

Tag

19 bits

Index

8 bits

Offset

5 bits

Tag V D

Tag V D

10

Block Replacement

 Under direct mapped placement

 No choice is possible: unique placement

 Under set associative placement

 One of the blocks within the unique set must be

selected for replacement

 First-In-First-Out (FIFO)

 Least Recently Used (LRU)

 Random

11

LRU Block Replacement…

Data

OR

Cache

Hit

Tag

19 bits

Index

8 bits

Offset

5 bits

Tag V

D
L

Tag

V
LD

4 bits

 Hardware must keep track of LRU information

 Within the cache directory

• Recall that LRU replacement was not considered

feasible for Virtual Memory

13

Write Policies

4A: When is Main Memory Updated on Write Hit?

 Write through: Writes are performed both in

cache and in Main Memory

+ Cache and memory copies are kept consistent

-- Multiple writes to the same location/block cause

higher memory traffic

-- Writes must wait for longer time (memory write)

Solution: Use a Write Buffer to hold these write

requests and allow processor to proceed

immediately

14

Write Policies…

 Write back: writes are performed only on

cache

 Modified blocks are written back to memory only

on replacement from the cache

o Need for dirty bit for each cache block

+ Writes will happen faster than with write through

+ Reduced traffic to memory

-- Cache & main memory copies are not always the

same

16

Putting it all together

 The computer you are using to run your

programs contains cache memory

 Cache organization and operation are

described by the terms we have seen

 Example: 32 KB 2-way set associative write-

back write-allocate LRU replacement cache

with block size of 32B

17

What Drives Computer Development?

Processor

60%/yr.

(2X/1.5yr)

Memory

9%/yr.

(2X/

yrs)

Processor-Memory

Performance Gap:

(grows 50% / year)

“Moore’s Law”

Year

Performance

DRAM

CPU