CS 213 F’98: Understanding Cache Design and Memory Hierarchy, Slides of Introduction to Computers

An in-depth look into the concepts of memory hierarchy, locality of reference, and cache design. It covers various topics such as levels in memory hierarchy, cache organization, caching principles, and cache implementation. The document also includes examples and simulations to help students understand the concepts.

Typology: Slides

2010/2011

Uploaded on 10/07/2011

rolla45
rolla45 🇺🇸

4

(6)

133 documents

1 / 36

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Topics
Memory Hierarchy
Locality of Reference
Cache Design
Direct Mapped
Associative
Caches
Oct. 22, 1998
15-213
class18.ppt
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24

Partial preview of the text

Download CS 213 F’98: Understanding Cache Design and Memory Hierarchy and more Slides Introduction to Computers in PDF only on Docsity!

Topics

  • Memory Hierarchy
  • Locality of Reference
  • Cache Design
    • Direct Mapped
    • Associative

Caches

Oct. 22, 1998

class18.ppt

Computer System

diskDisk diskDisk

Memory-I/O bus

Processor

Cache

Memory

I/O

controller

I/O

controller

I/O

controller

Display Network

interrupt

Alpha 21164 Chip Photo

Microprocessor Report 9/12/

Caches:

L1 data L1 instruction L2 unified TLB Branch history

Alpha 21164 Chip Caches

Caches:

L1 data L1 instruction L2 unified TLB Branch history

Right Half L

Right Half L

L I n s t r.

L Data

L Tags

L3 Control

Caching: The Basic Idea

Main Memory

  • Stores words A–Z in example

Cache

  • Stores subset of the words 4 in example
  • Organized in blocks
    • Multiple words
    • To exploit spatial locality

Access

  • Word must be in cache for processor to access

Big, Slow Memory

A B C • • • Y Z

Small, Fast Cache A B G H

Processor

Basic Idea (Cont.)

Maintaining Cache

  • Every time processor performs load or store, bring block containing word into cache - May need to evict existing block
  • Subsequent loads or stores to any word in block performed within cache

A

B

G

H

Initial A B C D

Read C A B C D

Y

Z

G

H

Read D Read Z

Cache holds 2 blocks Each with 2 words

Load block C+D into cache “Cache miss”

Word already in cache “Cache hit”

Load block Y+Z into cache Evict oldest entry

Design Issues for Caches

Key Questions

  • Where should a block be placed in the cache? (block placement)
  • How is a block found in the cache? (block identification)
  • Which block should be replaced on a miss? (block replacement)
  • What happens on a write? (write strategy)

Constraints

  • Design must be very simple
    • Hardware realization
    • All decision making within nanosecond time scale
  • Want to optimize performance for “typical” programs
    • Do extensive benchmarking and simulations
    • Many subtle engineering trade-offs

Direct-Mapped Caches

Simplest Design

  • Given memory block has unique cache location

Parameters

  • Block size B = 2b
    • Number of bytes in each block
    • Typically 2X–8X word size
  • Number of Sets S = 2s
    • Number of blocks cache can hold
  • Total Cache Size = BS = 2b+s*

Physical Address

  • Address used to reference main memory
  • n bits to reference N = 2n^ total bytes
  • Partition into fields
    • Offset: Lower b bits indicate which byte within block
    • Set: Next s bits indicate how to locate block within cache
    • Tag: Identifies this block when in cache

n-bit Physical Address t s b

tag set index offset

Direct-Mapped Cache Tag Matching

Identifying Block

  • Must have tag match high order bits of address
  • Must have Valid = 1 Tag Valid^0 1 • • •^ B–

Selected Set:

t s b

tag set index offset Physical Address

  • Lower bits of address select byte or word within cache block

Direct Mapped Cache Simulation

N=16 byte addresses B=2 bytes/block S=4 sets E= entry/set Address trace (reads): 0 [0000] 1 [0001] 13 [1101] 8 [1000] 0 [0000] 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111

x

t=1 s=2 b= xx x

1 0 m[1] m[0]

v tag data

0 [0000] (miss)

1 0 m[1] m[0]

v tag data

1 1 m[13] m[12]

13 [1101] (miss)

1 1 m[9] m[8]

v tag data

8 [1000] (miss)

1 0 m[1] m[0]

v tag data

1 1 m[13] m[12]

0 [0000] (miss)

Direct Mapped Cache Implementation

(DECStation 3100)

tag

31 30 29 .................. 19 18 17 16 15 14 13 .................. 5 4 3 2 1 0 set (^) offsetbyte

valid tag (16 bits) data (32 bits)

data

hit

16,384 sets

Properties of Direct Mapped Caches

Strength

  • Minimal control hardware overhead
  • Simple design
  • (Relatively) easy to make fast

Weakness

  • Vulnerable to thrashing
  • Two heavily used blocks have same cache index
  • Repeatedly evict one to make room for other

Cache Block

Thrashing Example

  • Access one element from each array per iteration

x[1]

x[0]

x[1020]

x[3]

x[2]

x[1021] x[1022] x[1023]

y[1]

y[0]

y[1020]

y[3]

y[2]

y[1021] y[1022] y[1023]

Cache Block

Cache Block

Cache Block

Cache Block

Cache Block

Cache Block

x[1]

x[0]

x[3]

x[2]

y[1]

y[0]

y[3]

y[2]

Cache Block

Thrashing Example: Good Case

Access Sequence

  • Read x[0]
    • x[0], x[1], x[2], x[3] loaded
  • Read y[0]
    • y[0], y[1], y[2], y[3] loaded
  • Read x[1]
    • Hit
  • Read y[1]
    • Hit
  • • • •
  • 2 misses / 8 reads

Analysis

  • x[i] and y[i] map to different cache blocks
  • Miss rate = 25%
    • Two memory accesses / iteration
    • On every 4th iteration have two misses

Timing

  • 10 cycle loop time
  • 28 cycles / cache miss
  • Average time / iteration = 10 + 0.25 * 2 * 28