Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding Cache Memory: Direct Mapped, Set Associative, and Multi-level Caches - Prof., Study notes of Computer Science

University of Pennsylvania (UPenn)Computer Science

Prof. D. Palsetia

An in-depth exploration of cache memory, its purpose, and various cache mapping schemes such as direct mapped, set associative, and multi-level caches. It covers topics like temporal and spatial locality, cache layout, cache size calculation, and cache write policies. The document also discusses the advantages and disadvantages of each cache mapping scheme.

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-xvw 🇺🇸

9 documents

1 / 14

This page cannot be seen from the preview

Don't miss anything!

1

Memory Hierarchy

Cache Organization

CIT 595

Spring 2008

2

CIT 595

Memory

Von Nuemann Model – stored program concept

2kx marray of stored bits

Address: unique (k-bit) identifier of location

Contents: m-bit value stored in location

Memory Access time affects CPU Performance

There is bound on how fast we can access data from

memory

This latency inherently slow down the overall

processing speed of the processor

3

CIT 595

Kinds of Memory

Volatile

Once the power is off, the information is lost

RAM - Random Access Memory

¾Access time is the same for all locations hence

Random Access

¾Memory can be read and written

¾The instructions and/or data are stored when

executing your programs

Non-Volatile

E.g. Magnetic disk, ROMs, Flash RAM

4

CIT 595

Kinds of RAM: Type I

SRAM – Static RAM

Memory we studied in chapter 3

SR Flip-Flop, D Flip-Flop

1-bit information (memory cell) needs cross-

coupled gates

Consists of 8 transistors per cell (1 NAND/NOR gate

requires 4 transistors)

Can be optimized to use 6 transistors per cell

Discover Study notes of Computer Science University of Pennsylvania (UPenn)

Partial preview of the text

Download Understanding Cache Memory: Direct Mapped, Set Associative, and Multi-level Caches - Prof. and more Study notes Computer Science in PDF only on Docsity!

Memory Hierarchy

Cache Organization

CIT 595

Spring 2008

2

CIT 595

Memory

Von Nuemann Model – stored program concept

^2

k

x

m

array of stored bits

Address: unique (

k

-bit) identifier of location

Contents:

m

-bit value stored in location

Memory Access time affects CPU Performance

There is bound on how fast we can access data frommemory

This latency inherently slow down the overallprocessing speed of the processor

3

CIT 595

Kinds of Memory

Volatile

Once the power is off, the information is lost

RAM - Random Access Memory^ ¾

Access time is the same for all locations hence Random Access ¾

Memory can be read and written ¾

The instructions and/or data are stored whenexecuting your programs

Non-Volatile

E.g. Magnetic disk, ROMs, Flash RAM

4

CIT 595

Kinds of RAM: Type I

SRAM

– Static RAM

Memory we studied in chapter 3

SR Flip-Flop, D Flip-Flop

1-bit information (

memory cell

) needs cross-

coupled gates

Consists of 8 transistors per cell (1 NAND/NOR gaterequires 4 transistors)

Can be optimized to use 6 transistors per cell

5

Kinds of RAM: Type II

DRAM - Dynamic RAM

1-memory cell consists of one

capacitor

and

transistor

Capacitor is used to store charge

Transistor acts as a switch which allows data to be read orwritten

DRAM access is slow

Charge on capacitor needs to be sensed for 0 or 1

Capacitors slowly leak their charge over time and hencemust be refreshed every few milliseconds to prevent dataloss

6

CIT 595

DRAM vs. SRAM Technology

DRAM is more

denser

Stores more bits per surface area

It cost same to get 4MB SRAM vs. 1GB DRAM

DRAM ~250x cheaper than SRAM

SRAM has faster access time

SRAM access time is 3ns to 10ns

DRAM access time is 30ns to 90ns

~ 10x slow to SRAM

7

Performance/Cost/Capacity

In general

Slow memory is cheap and has more storage capacity Fast memory is expensive and has less storage capacity

Ideal Goals

Memory that operates at processor speeds

¾

Time it takes to compute basic operation ¾

Don’t want memory access time to dominate the clockcycle time or add to CPI

Memory as large as needed for all running programs Memory that is cost effective

So how do we get best of everything?

Use Memory

Hierarchy

8

CIT 595

Memory Hierarchy

To provide the best performance at the lowest cost, memory is organized in a hierarchical fashion Small, fast storage elements are

near

the CPU

Larger, (almost) permanent storage in the form of disk and media storage is still further from the CPU Larger, slower memory is

accessed through

the data bus

Each level of memory keeps a subset of the data contained in the lower memory-level

13

Basic Cache Organization

Memory is divided into

blocks

Each block contains

fixed numbers

of words

Word = size of data stored in onelocation e.g. 8 bits, 16 bits etc..

One block is used as the

minimum

unit of transfer

between main memory

and cache

Hence, each

location

in the cache

stores 1 block

Also some extra info – more on it ahead

Word 0Word 1Word 2Word 3

Main Memory

Block 0Block 1

Word 2

Word 3

Word 0

Word 1

Cache

14

CIT 595

Cache Mapping Scheme

Main memory address generated by the processor cannot be used to access the cache

Hence a

mapping scheme

is required that converts the

generated main memory address into a cache location

Also determines where the block will placed when it is originally copied into the cache

15

Address Conversion to Cache Location

Address Conversion is done by giving special significance to the

bits of the main memory address

The address is split into distinct groups called

fields

Just like instruction decoding is done based on certain bit fields

The group fields are a way to find:

Which cache location? Which word in the block? Whether it is the right data are looking for? Some kind ofunique identifier

16

CIT 595

Mapping Scheme 1: Direct Mapped Cache

In a direct mapped cache

consisting of

N

blocks of

cache (i.e. N locations)

Block

X

of main memory

maps to cache block as

Y

X

mod

N

E.g. if we have 10 blocks of

cache, block 7 of cache mayhold blocks 7, 17, 27, 37,.. .of main memory.

17

Direct Mapped Scheme: Address Conversion

Word

Block

Tag

n-bit main memory address

Word = which word in block?Block = Which location in Cache?Tag = unique identifier w.r.t one blockNote: Tag is used to distinguish whethermain memory block 7 or 17 is stored incache block 7

18

CIT 595

Cache Layout

0 1 2 3 Block

No.

Data

Tag

E.g. Cache with 4 blocks and 8 words per block

19

Example of Direct Mapped Scheme

Suppose our memory consists of 2

14

words, and cache

has 16 = 2

4

blocks, and each block holds 8 words

Thus main memory is divided into 2

14

(^3)

11

blocks

Of the 14 bit address, we need 4 bits for the block field, 3 bits for the word, and the tag is what’s left over

20

CIT 595

Direct Mapped Cache with 16 blocks (^012345131415) Block

No.

Data

Tag

25

Direct Mapped Cache with address 3AB

3AB

(^012345131415) Block

No.

Data

Tag

26

CIT 595

Disadvantage of Direct Mapped Cache

Suppose a program generates a series of memory

references such as:

1AB,

3AB,

1AB,

3AB

The cache will continually evict and replace blocks

Known as thrashing

The theoretical advantage offered by the cache is

lost in this extreme case

Other cache mapping schemes are designed to

prevent this kind of thrashing

27

Calculating Cache Size

Whenever Cache Size is mentioned, its stated with capacity of data that it holds

Tag storage is considered overhead

Suppose our memory consists of 2

14

locations (or

words), and cache has 16 = 2

(^4) blocks, and each block

holds 8 words

Cache Size = # of Blocks * Block Size

There are 16 locations in the cache -> # of Blocks

Each block stores 8 words

Assume 1 word is 8 bits, then Block size = 8 bytes

Cache size = 16 x 8 bytes = 128 bytes

28

CIT 595

Address Breakup

Why is the address broken up in a particular manner?

Less variation in higher order bits compared to middle order bits

If the higher order bits (i.e. bits used for tag) are used for determining cache location (block) then values from consecutiveaddresses would map to same location in cache

The middle bits are preferred as they would cause less thrashing

Word

Block

Tag

29

Valid Cache block

How do we know whether the block in cache is valid or not?

For example:

When processor just starts up, the cache will be emptyand tag fields in each location will be meaningless

Thus tag fields must be ignored initially when the cacheis starting to fill up

For validity, another bit called

valid bit

is added to the cache

indicate whether the block contains valid information

0 – not valid, 1 – valid

All blocks at start up would be not valid

If data from main memory is got into cache for aparticular block, then valid bit for that field is set

Valid bit will contribute as overheard bits

30

CIT 595

Direct Mapped Cache with Valid (V) Field 5 (^01234131415) Block

No.

Data

Tag

0 0 0 0

V^000001

Address 3ABreferenced forthe first time.Entire block isbrought intocache block 5.

31

Hit or Miss in the Cache

Hit

means that we actually found data in the cache

A hit occurs when valid bit = 1

AND

tag in the cache

matches the tag field of the address

If both conditions don’t hold then we did not find the data in cache

This is known as

miss

in cache

On a miss, the data is brought from main memory into the cache, and the valid bit is set

32

CIT 595

Mapping Scheme 2: Fully Associative Cache

Instead of placing memory blocks in specific

cache locations based on memory address, wecould allow a block to

go anywhere

in cache

This way, cache would have to fill up before any

blocks are evicted

This is how

fully associative

cache works

A memory address is partitioned into only two

fields: the tag and the word

37

Scheme 3: Address Conversion

Like direct-mapped cache except, middle bits of the mainmemory address indicate the

set

in cache

38

CIT 595

K-Set Associative Cache Example

Suppose we have a main memory of 2

14

locations

Map this memory to a

2-way

set associative cache

having

16 blocks

where each block contains 8 words

Number of Sets = Number of Blocks in cache/ Blocks per set (K)

Since this is a 2-way cache, each set consists of 2blocks, and there are 8 sets i.e. 16/2 = 8 sets

39

Advantage & Disadvantage Set Associate

Advantage

Unlike direct mapped cache, there is less trashing

¾

If an address maps to a set, there is

choice

for placing the

new block and evicting an old block

Disadvantage

Tags

of each block in a set need to be

matched

(in parallel) to

figure out whether the data is present in cache

¾

Cost for matching is less than fully associative but it ismore than direct mapped i.e. k comparators ¾

Contributes to access time

If both slots are filled, then we need an

algorithm

that will

decide which old block to evict (like fully associate)

¾

Adds to design complexity

40

CIT 595

Replacement Algorithm/Policy

Optimal Goal

Keep blocks required in the near future Replace block which is not used for the longest period of time

L

east recently used

(LRU)

Evicts the block that has been unused for the longest period of time

Disadvantage: complexity

LRU has to maintain an access history for each block, whichwill slow down the cache Usually some approximation is used

¾

E.g. Not Most Recently Used (NMRU)

41

Replacement Algorithm/Policy (contd..)

First-in, first-out

(FIFO)

In FIFO, the block that has been in the cache the

longest,

regardless of when it was last used

Easy to implement compared to LRU

Does not always match temporal locality Random Replacement

It picks a block at

random

and replaces it with a new block

Can evict a block that will be needed often or needed soon, but it never thrashes

Difficult to implement a truly random replacement

42

CIT 595

What about blocks that have been written too?

While your program is running, it will modify some locations

We need to keep main memory and cache

consistent

if we are modifying data

Update cache and memory

Both at the same time

Update cache and then memory at later time

The two choices are known

Cache Write policies

43

Cache Write Policies

Write-Through

Update cache and main memory simultaneously on every write

Advantage

Keeps cache main memory consistent at the same time

Disadvantage

All writes require main memory access (bus transaction)

Slows down the system

This is what we were avoiding in the first place whendecided to introduce the cache

44

CIT 595

Cache Write Policies (contd..)

Write Back or Copy Back

Modified data is written back to main memory when the block is going to be evicted (removed) from cache

Advantage

Faster than write-through, time is not spent accessingmain memory

Disadvantage

Need extra bit in cache to indicate which block has beenmodified

Like valid bit, a another bit is introduced called

Dirty Bit

to indicate a modified cache block.

0 – Not Dirty, 1 – Dirty (modified)

49

Multi-level Caches (contd..)

In a multi-level cache:

If the cache system used an

inclusive

cache, the same data

may be present at multiple levels of cache

Strictly inclusive

caches guarantee that all data in a smaller

cache also exists at the next higher level

Exclusive

caches permit only one copy of the data

The tradeoffs in choosing one over the other involveweighing the variables of access time, memory size, andcircuit complexity

50

CIT 595

Instruction and Data Caches

The cache we have been discussing is called a

unified

or

integrated

cache where both instructions

and data are cached

Many modern systems employ

separate

caches

for data and instructions

This is called a

Harvard

cache

The separation of data from instructions provides

better locality, at the cost of more hardware

51

EAT

The performance of hierarchical memory is measured by its effective access time

(EAT)

EAT is a weighted average that takes into account the hit ratio and relative access times of successive levels ofmemory

The EAT for a two-level memory is given by:

EAT =

H

×

Access Time for Level

i

(1-

H

)^ ×

Access for Level

i+

H is the hit rate i.e. % time data is found in level

i

This equation can be extended to any number of memory levels

52

CIT 595

Example of EAT

Consider a system with a main memory access timeof 200ns supported by a cache having a 10nsaccess time and a hit rate of 99%.EAT = 0.99(10ns) + 0.01(200ns) = 9.9ns + 2ns =11ns

53

CIT 595

Review of Cache Organization

Q1: Where can a block be placed in the cache level? Mapping scheme Q2: How is a block found if it is in the cache?^ Mapping Scheme Q3: If cache is full, then where do we put the new blocki.e. which old block should we replace? Block replacement policy Q4: If we write to a block in cache, should we update themain memory at the same time? Write Policy

Understanding Cache Memory: Direct Mapped, Set Associative, and Multi-level Caches - Prof., Study notes of Computer Science

Related documents

Partial preview of the text

Download Understanding Cache Memory: Direct Mapped, Set Associative, and Multi-level Caches - Prof. and more Study notes Computer Science in PDF only on Docsity!

Memory Hierarchy

Cache Organization

CIT 595

Spring 2008

Memory

Von Nuemann Model – stored program concept

^2

x

m

array of stored bits

Memory Access time affects CPU Performance

Kinds of Memory

Volatile

Non-Volatile

Kinds of RAM: Type I

SRAM

– Static RAM

Memory we studied in chapter 3

1-bit information (

memory cell

) needs cross-

coupled gates

1-memory cell consists of one

capacitor

and

transistor

DRAM access is slow

In a direct mapped cache

consisting of

N

blocks of

cache (i.e. N locations) 

Block

X

of main memory

maps to cache block as

Y

X

mod

N

E.g. if we have 10 blocks of

cache, block 7 of cache mayhold blocks 7, 17, 27, 37,.. .of main memory.

3AB

Suppose a program generates a series of memory

references such as:

1AB,

3AB,

1AB,

3AB

The theoretical advantage offered by the cache is

lost in this extreme case 

Other cache mapping schemes are designed to

prevent this kind of thrashing

V^000001

AND

Instead of placing memory blocks in specific

cache locations based on memory address, wecould allow a block to

go anywhere

in cache

This way, cache would have to fill up before any

blocks are evicted 

This is how

fully associative

cache works

A memory address is partitioned into only two

fields: the tag and the word

(LRU)

(FIFO)

In a multi-level cache:

The cache we have been discussing is called a

unified

or

integrated

cache where both instructions

and data are cached 

Many modern systems employ

cache (i.e. N locations)

lost in this extreme case

blocks are evicted

and data are cached