Understanding Cache Memory: Direct Mapped, Set Associative, and Multi-level Caches - Prof., Study notes of Computer Science

An in-depth exploration of cache memory, its purpose, and various cache mapping schemes such as direct mapped, set associative, and multi-level caches. It covers topics like temporal and spatial locality, cache layout, cache size calculation, and cache write policies. The document also discusses the advantages and disadvantages of each cache mapping scheme.

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-xvw
koofers-user-xvw 🇺🇸

9 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Memory Hierarchy
Cache Organization
CIT 595
Spring 2008
2
CIT 595
Memory
Von Nuemann Model – stored program concept
2kx marray of stored bits
Address: unique (k-bit) identifier of location
Contents: m-bit value stored in location
Memory Access time affects CPU Performance
There is bound on how fast we can access data from
memory
This latency inherently slow down the overall
processing speed of the processor
3
CIT 595
Kinds of Memory
Volatile
Once the power is off, the information is lost
RAM - Random Access Memory
¾Access time is the same for all locations hence
Random Access
¾Memory can be read and written
¾The instructions and/or data are stored when
executing your programs
Non-Volatile
E.g. Magnetic disk, ROMs, Flash RAM
4
CIT 595
Kinds of RAM: Type I
SRAM Static RAM
Memory we studied in chapter 3
SR Flip-Flop, D Flip-Flop
1-bit information (memory cell) needs cross-
coupled gates
Consists of 8 transistors per cell (1 NAND/NOR gate
requires 4 transistors)
Can be optimized to use 6 transistors per cell
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Understanding Cache Memory: Direct Mapped, Set Associative, and Multi-level Caches - Prof. and more Study notes Computer Science in PDF only on Docsity!

Memory Hierarchy

Cache Organization

CIT 595

Spring 2008

2

CIT 595

Memory

„

Von Nuemann Model – stored program concept

„

^2

k

x

m

array of stored bits

„

Address: unique (

k

-bit) identifier of location

„

Contents:

m

-bit value stored in location

„

Memory Access time affects CPU Performance

„

There is bound on how fast we can access data frommemory „

This latency inherently slow down the overallprocessing speed of the processor

3

CIT 595

Kinds of Memory

Volatile

„

Once the power is off, the information is lost „

RAM - Random Access Memory^ ¾

Access time is the same for all locations hence Random Access ¾

Memory can be read and written ¾

The instructions and/or data are stored whenexecuting your programs

Non-Volatile

„

E.g. Magnetic disk, ROMs, Flash RAM

4

CIT 595

Kinds of RAM: Type I

SRAM

– Static RAM

„

Memory we studied in chapter 3

„

SR Flip-Flop, D Flip-Flop

„

1-bit information (

memory cell

) needs cross-

coupled gates

„

Consists of 8 transistors per cell (1 NAND/NOR gaterequires 4 transistors) „

Can be optimized to use 6 transistors per cell

5

Kinds of RAM: Type II

DRAM - Dynamic RAM „

1-memory cell consists of one

capacitor

and

transistor

„

Capacitor is used to store charge „

Transistor acts as a switch which allows data to be read orwritten

„

DRAM access is slow

„

Charge on capacitor needs to be sensed for 0 or 1 „

Capacitors slowly leak their charge over time and hencemust be refreshed every few milliseconds to prevent dataloss

6

CIT 595

DRAM vs. SRAM Technology

„

DRAM is more

denser

„

Stores more bits per surface area „

It cost same to get 4MB SRAM vs. 1GB DRAM

DRAM ~250x cheaper than SRAM

„

SRAM has faster access time „

SRAM access time is 3ns to 10ns „

DRAM access time is 30ns to 90ns

~ 10x slow to SRAM

7

Performance/Cost/Capacity

„

In general

„ Slow memory is cheap and has more storage capacity „ Fast memory is expensive and has less storage capacity

„

Ideal Goals

„ Memory that operates at processor speeds

¾

Time it takes to compute basic operation ¾

Don’t want memory access time to dominate the clockcycle time or add to CPI

„ Memory as large as needed for all running programs „ Memory that is cost effective

„

So how do we get best of everything?

„ Use Memory

Hierarchy

8

CIT 595

Memory Hierarchy

„ To provide the best performance at the lowest cost, memory is organized in a hierarchical fashion „ Small, fast storage elements are

near

the CPU

„ Larger, (almost) permanent storage in the form of disk and media storage is still further from the CPU „ Larger, slower memory is

accessed through

the data bus

„ Each level of memory keeps a subset of the data contained in the lower memory-level

13

Basic Cache Organization

„

Memory is divided into

blocks

„

Each block contains

fixed numbers

of words

„

Word = size of data stored in onelocation e.g. 8 bits, 16 bits etc..

„

One block is used as the

minimum

unit of transfer

between main memory

and cache „

Hence, each

location

in the cache

stores 1 block

„ Also some extra info – more on it ahead

Word 0Word 1Word 2Word 3

Main Memory

Block 0Block 1

Word 2

Word 3

Word 0

Word 1

Cache

14

CIT 595

Cache Mapping Scheme

„

Main memory address generated by the processor cannot be used to access the cache „

Hence a

mapping scheme

is required that converts the

generated main memory address into a cache location „

Also determines where the block will placed when it is originally copied into the cache

15

Address Conversion to Cache Location

„

Address Conversion is done by giving special significance to the

bits of the main memory address

„

The address is split into distinct groups called

fields

„ Just like instruction decoding is done based on certain bit fields

„

The group fields are a way to find:

„ Which cache location? „ Which word in the block? „ Whether it is the right data are looking for? Some kind ofunique identifier

16

CIT 595

Mapping Scheme 1: Direct Mapped Cache „

In a direct mapped cache

consisting of

N

blocks of

cache (i.e. N locations) „

Block

X

of main memory

maps to cache block as

Y

X

mod

N

„

E.g. if we have 10 blocks of

cache, block 7 of cache mayhold blocks 7, 17, 27, 37,.. .of main memory.

17

Direct Mapped Scheme: Address Conversion

Word

Block

Tag

n-bit main memory address

Word = which word in block?Block = Which location in Cache?Tag = unique identifier w.r.t one blockNote: Tag is used to distinguish whethermain memory block 7 or 17 is stored incache block 7

18

CIT 595

Cache Layout

0 1 2 3 Block

No.

Data

Tag

E.g. Cache with 4 blocks and 8 words per block

19

Example of Direct Mapped Scheme

„

Suppose our memory consists of 2

14

words, and cache

has 16 = 2

4

blocks, and each block holds 8 words

„

Thus main memory is divided into 2

14

(^3)

11

blocks

„

Of the 14 bit address, we need 4 bits for the block field, 3 bits for the word, and the tag is what’s left over

20

CIT 595

Direct Mapped Cache with 16 blocks (^012345131415) Block

No.

Data

Tag

25

Direct Mapped Cache with address 3AB

3AB

(^012345131415) Block

No.

Data

Tag

26

CIT 595

Disadvantage of Direct Mapped Cache

„

Suppose a program generates a series of memory

references such as:

1AB,

3AB,

1AB,

3AB

„

The cache will continually evict and replace blocks „

Known as thrashing

„

The theoretical advantage offered by the cache is

lost in this extreme case „

Other cache mapping schemes are designed to

prevent this kind of thrashing

27

Calculating Cache Size

„

Whenever Cache Size is mentioned, its stated with capacity of data that it holds

„ Tag storage is considered overhead

„

Suppose our memory consists of 2

14

locations (or

words), and cache has 16 = 2

(^4) blocks, and each block

holds 8 words „

Cache Size = # of Blocks * Block Size

„ There are 16 locations in the cache -> # of Blocks

„

Each block stores 8 words

„

Assume 1 word is 8 bits, then Block size = 8 bytes

„

Cache size = 16 x 8 bytes = 128 bytes

28

CIT 595

Address Breakup

„

Why is the address broken up in a particular manner? „

Less variation in higher order bits compared to middle order bits „

If the higher order bits (i.e. bits used for tag) are used for determining cache location (block) then values from consecutiveaddresses would map to same location in cache „

The middle bits are preferred as they would cause less thrashing

Word

Block

Tag

29

Valid Cache block

„

How do we know whether the block in cache is valid or not? „

For example:

„

When processor just starts up, the cache will be emptyand tag fields in each location will be meaningless „

Thus tag fields must be ignored initially when the cacheis starting to fill up

„

For validity, another bit called

valid bit

is added to the cache

indicate whether the block contains valid information

„

0 – not valid, 1 – valid „

All blocks at start up would be not valid „

If data from main memory is got into cache for aparticular block, then valid bit for that field is set „

Valid bit will contribute as overheard bits

30

CIT 595

Direct Mapped Cache with Valid (V) Field 5 (^01234131415) Block

No.

Data

Tag

0 0 0 0

V^000001

Address 3ABreferenced forthe first time.Entire block isbrought intocache block 5.

31

Hit or Miss in the Cache

„

Hit

means that we actually found data in the cache

„

A hit occurs when valid bit = 1

AND

tag in the cache

matches the tag field of the address „

If both conditions don’t hold then we did not find the data in cache

„

This is known as

miss

in cache

„

On a miss, the data is brought from main memory into the cache, and the valid bit is set

32

CIT 595

Mapping Scheme 2: Fully Associative Cache „

Instead of placing memory blocks in specific

cache locations based on memory address, wecould allow a block to

go anywhere

in cache

„

This way, cache would have to fill up before any

blocks are evicted „

This is how

fully associative

cache works

„

A memory address is partitioned into only two

fields: the tag and the word

37

Scheme 3: Address Conversion

Like direct-mapped cache except, middle bits of the mainmemory address indicate the

set

in cache

38

CIT 595

K-Set Associative Cache Example

„

Suppose we have a main memory of 2

14

locations

„

Map this memory to a

2-way

set associative cache

having

16 blocks

where each block contains 8 words

„

Number of Sets = Number of Blocks in cache/ Blocks per set (K)

„

Since this is a 2-way cache, each set consists of 2blocks, and there are 8 sets i.e. 16/2 = 8 sets

39

Advantage & Disadvantage Set Associate „

Advantage

„ Unlike direct mapped cache, there is less trashing

¾

If an address maps to a set, there is

choice

for placing the

new block and evicting an old block

„

Disadvantage

„ Tags

of each block in a set need to be

matched

(in parallel) to

figure out whether the data is present in cache

¾

Cost for matching is less than fully associative but it ismore than direct mapped i.e. k comparators ¾

Contributes to access time

„ If both slots are filled, then we need an

algorithm

that will

decide which old block to evict (like fully associate)

¾

Adds to design complexity

40

CIT 595

Replacement Algorithm/Policy

Optimal Goal

„ Keep blocks required in the near future „ Replace block which is not used for the longest period of time

L

east recently used

(LRU)

„

Evicts the block that has been unused for the longest period of time „

Disadvantage: complexity

„ LRU has to maintain an access history for each block, whichwill slow down the cache „ Usually some approximation is used

¾

E.g. Not Most Recently Used (NMRU)

41

Replacement Algorithm/Policy (contd..)

First-in, first-out

(FIFO)

„

In FIFO, the block that has been in the cache the

longest,

regardless of when it was last used „

Easy to implement compared to LRU „

Does not always match temporal locality Random Replacement „

It picks a block at

random

and replaces it with a new block

„

Can evict a block that will be needed often or needed soon, but it never thrashes „

Difficult to implement a truly random replacement

42

CIT 595

What about blocks that have been written too?

„

While your program is running, it will modify some locations „

We need to keep main memory and cache

consistent

if we are modifying data „

Update cache and memory „

Both at the same time „

Update cache and then memory at later time „

The two choices are known

Cache Write policies

43

Cache Write Policies

Write-Through „

Update cache and main memory simultaneously on every write „

Advantage

„

Keeps cache main memory consistent at the same time

„

Disadvantage

„

All writes require main memory access (bus transaction) „

Slows down the system „

This is what we were avoiding in the first place whendecided to introduce the cache

44

CIT 595

Cache Write Policies (contd..)

Write Back or Copy Back „

Modified data is written back to main memory when the block is going to be evicted (removed) from cache „

Advantage

„

Faster than write-through, time is not spent accessingmain memory

„

Disadvantage

„

Need extra bit in cache to indicate which block has beenmodified „

Like valid bit, a another bit is introduced called

Dirty Bit

to indicate a modified cache block.

0 – Not Dirty, 1 – Dirty (modified)

49

Multi-level Caches (contd..)

„

In a multi-level cache:

„

If the cache system used an

inclusive

cache, the same data

may be present at multiple levels of cache „

Strictly inclusive

caches guarantee that all data in a smaller

cache also exists at the next higher level „

Exclusive

caches permit only one copy of the data

„

The tradeoffs in choosing one over the other involveweighing the variables of access time, memory size, andcircuit complexity

50

CIT 595

Instruction and Data Caches

„

The cache we have been discussing is called a

unified

or

integrated

cache where both instructions

and data are cached „

Many modern systems employ

separate

caches

for data and instructions

„

This is called a

Harvard

cache

„

The separation of data from instructions provides

better locality, at the cost of more hardware

51

EAT

„

The performance of hierarchical memory is measured by its effective access time

(EAT)

„

EAT is a weighted average that takes into account the hit ratio and relative access times of successive levels ofmemory „

The EAT for a two-level memory is given by:

EAT =

H

×

Access Time for Level

i

  • (1-

H

)^ ×

Access for Level

i+

„ H is the hit rate i.e. % time data is found in level

i

„

This equation can be extended to any number of memory levels

52

CIT 595

Example of EAT

Consider a system with a main memory access timeof 200ns supported by a cache having a 10nsaccess time and a hit rate of 99%.EAT = 0.99(10ns) + 0.01(200ns) = 9.9ns + 2ns =11ns

53

CIT 595

Review of Cache Organization

Q1: Where can a block be placed in the cache level? Mapping scheme Q2: How is a block found if it is in the cache?^ Mapping Scheme Q3: If cache is full, then where do we put the new blocki.e. which old block should we replace? Block replacement policy Q4: If we write to a block in cache, should we update themain memory at the same time? Write Policy