Memory Hierarchies and Caches - Computer Systems - Lecture Slides, Slides of Computer Science

These are the Lecture Slides of Computer Systems which includes Writing to Cache, Memory Access, Simple Direct-Mapped Cache, Inconsistent Memory, Write-Through Caches, Write-Back Caches, Finishing Write Back, Write Misses etc.Key important points are: Memory Hierarchies and Caches, Memory Systems, Cache Introduction, Introducing Caches, Principle of Locality, Spatial Locality, Temporal Locality, Locality in Program, Kinds of Caches

Typology: Slides

2012/2013

Uploaded on 03/27/2013

agarkar
agarkar 🇮🇳

4.3

(26)

372 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSE 410
Computer Systems
Lt
11
MHihi&Ch
L
ec
t
ure
11
M
emory
Hi
erarc
hi
es
&
C
ac
h
es
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Memory Hierarchies and Caches - Computer Systems - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

CSE 410Computer Systems

L

t^

M

Hi

hi

& C

h

Lecture 11 – Memory Hierarchies & Caches

Memory Systems and I/OMemory

Systems and I/O

•^

We’ve already seen how to make a fast processor. How can wesupply the CPU with enough data to keep it busy?supply the CPU with enough data to keep it busy?

-^

Part of CS410 focuses on memory and input/output issues,which are frequently bottlenecks that limit the performance of asystem.

-^

We’ll start off by looking at memory systems and turn to I/O.– How caches can dramatically improve the speed of memory

accesses.How virtual memory provides security and ease of

  • How virtual memory provides security and ease of

programming

  • How processors, memory and peripheral devices can be

connected

Memory

Processor

3

Input/Output

Large and fastLarge

and fast

-^

Today’s computers depend upon large and fast storage

y^

p

p

p

g

g

systems.– Large storage capacities are needed for many

database applications scientific computations withdatabase applications, scientific computations withlarge data sets, video and music, and so forth.

  • Speed is important to keep up with our pipelined

CPUs

hich ma

access both an instr ction and data

CPUs, which may access both an instruction and datain the same clock cycle. Things get become evenworse if we move to a superscalar CPU design.

-^

So far we’ve assumed our memories can keep up and ourCPU can access memory twice in one cycle, but as we’llsee that’s a simplification.

5

Small or slowSmall

or slow

-^

Unfortunately there is a tradeoff between speed, cost and capacity.

Storage

Speed

Cost

Capacit

Storage

Speed

Cost

Capacit

y

Static RAM

Fastest

Expensive

Smallest

Dynamic RAM

Slow

Cheap

Large

Hard disks

Slowest

Cheapest

Largest

-^

Fast memory is too expensive for most people to buy a lot of.

-^

But dynamic memory has a much longer delay than other functional units in adatapath If every lw or sw accessed dynamic memory we’d have to either

Hard disks

Slowest

Cheapest

Largest

datapath. If every lw or sw accessed dynamic memory, we d have to eitherincrease the cycle time or stall frequently.

-^

Here are rough estimates of some current storage parameters.*

Storage

Delay

Cost/MB

Capacity

Storage

Delay

Cost/MB

Capacity

Static RAM

1-10 cycles

~$

128KB-2MB

Dynamic RAM

100-200 cycles

~$0.

128MB-4GB

Hard disks

10 000 000 cycles

~$0 0005

20GB-400GB

6

Hard disks

10

,000,000 cycles

~$0.

20GB-400GB

*These numbers are a couple of years old now, but the ratios are still about right.More recent numbers in Sec. 5.1 of the book.

The principle of localityThe

principle of locality

-^

It’s usually difficult or impossible to figure out what data

y^

p

g

will be “most frequently accessed” before a programactually runs, which makes it hard to know what to storeinto the small, precious cache memory.

, p

y

-^

But in practice, most programs exhibit

locality

, which the

cache can take advantage of.

The principle of temporal localit

sa s that if a program

  • The principle of temporal locality says that if a program

accesses one memory address, there is a good chancethat it will access the same address again.

  • The principle of spatial locality says that if a program

accesses one memory address, there is a good chancethat it will also access other nearby addresses.

8

Temporal locality in programs •^

The principle of temporal locality says that if a program accessesone memory address there is a good chance that it will access the

Temporal locality in programs

one memory address, there is a good chance that it will access thesame address again.

-^

Loops are excellent examples of temporal locality in programs.– The loop body will be executed many times.– The computer will need to access those same few locations of

the instruction memory repeatedly.

-^

For example:

Loop:

lw

$t0,

0($s1)

add

$t0, $t0, $s

sw

$t0,

0($s1)

addi

$s1, $s1, -

  • Each instruction will be fetched over and over again, once on

every loop iteration

$^

, $

,

bne

$s1,

$0,

Loop

9

every

loop iteration.

Spatial locality in programs •^

The principle of spatial locality says that if a program accessesone memory address there is a good chance that it will also

Spatial locality in programs

one memory address, there is a good chance that it will alsoaccess other nearby addresses.

sub

$sp,

$sp,

16

sw

$ra,

0($sp)

sw

$s0,

4($sp)

sw

$a0,

8($sp)

sw

$a1,

12($sp)

•^

Nearly every program exhibits spatial locality, becauseinstructions are usually executed in sequence—if we execute aninstruction at memory location

i^

then we will probably also

instruction at memory location

i , then we will probably also

execute the next instruction, at memory location

i+

•^

Code fragments such as loops exhibit

both

temporal and spatial

locality.

11

Spatial locality in data

-^

Spatial locality in dataPrograms often accessdata that is stored

sum = 0;

data

that is stored

contiguously.– Arrays, like a in the

code on the top are

for (i = 0; i <

MAX; i++)

sum = sum + a[i];

code

on the top, are

stored in memorycontiguously.

  • The individual fields of

employee.name = “Homer Simpson”;employee.boss =

“Mr.

Burns”;

employee.age = 45;

The

individual fields of

a record or object likeemployee are alsokept contiguously in

employee

.age

45;

p

g

y

memory.

-^

Can data have bothspatial and temporal

12

p

p

locality?

How caches take advantage ofspatial localityspatial locality

•^

When the CPU reads location

i^

from main

memory, a copy of that data is placed in the

CPU

y^

py

p

cache.

-^

But instead of just copying the contents oflocation

i , we can copy

several

values into the

cache at once such as the four bytes from

CPU

cache at once, such as the four bytes fromlocations

i^

through

i^

  • If the CPU later does need to read from

locations

i^

i

  • 2 or

i^

  • 3 it can access that

A little staticRAM (cache)

locations

i^

i^

  • 2 or

i^

  • 3, it can access that

data from the cache and not the slower mainmemory.

  • For example, instead of reading just one array

Lots of

element at a time, the cache might actually beloading four array elements at once.

•^

Again, the initial load incurs a performancepenalty but we’re gambling on spatial locality and

dynamic RAM

14

penalty

, but we re gambling on spatial locality and

the chance that the CPU will need the extra data.

Other kinds of cachesOther

kinds of caches

•^

The general idea behind caches is used in many otheri

i

situations.

-^

Networks are probably the best example.– Networks have relatively high “latency” and low “bandwidth,”

t d d t

t^

f^

d^

i^

bl

so repeated data transfers are undesirable.

  • Browsers like Netscape and Internet Explorer store your

most recently accessed web pages on your hard disk.Administrators can set up a network wide cache and

  • Administrators can set up a network-wide cache, and

companies like Akamai also provide caching services.

•^

A few other examples:

Many processors have a “translation lookaside buffer ” which

  • Many processors have a

translation lookaside buffer, which

is a cache dedicated to virtual memory support.

  • Operating systems may store frequently-accessed disk

blocks, like directories, in main memory... and that data may

,^

,^

y^

y

then in turn be stored in the CPU cache!

15

A simple cache designA

simple cache design

-^

Caches are divided into blocks, which may be of various sizes.– The number of blocks in a cache is usually a power of 2.– For now we’ll say that each block contains one byte. This won’t take

advantage of spatial locality, but we’ll do that next time.

-^

Here is an example cache with eight blocks, each holding one byte.

Blockindex

8-bit data

000001010011100101

17

110111

Four important questionsFour

important questions

When we copy a block of data from main memory

  1. When we copy a block of data from main memory

to the cache, where exactly should we put it?

  1. How can we tell if a word is already in the cache, or

if it has to be fetched from main memory first?if it has to be fetched from main memory first?

  1. Eventually, the small cache memory might fill up.

To load a new block from main RAM, we’d have toreplace one of the existing blocks in the cachereplace one of the existing blocks in the cache...which one?

  1. How can

write

operations be handled by the

memory system?memory system?

ƒ^

Questions 1 and 2 are related

we have to know where the

18

ƒ^

Questions 1 and 2 are related—we have to know where thedata is placed if we ever hope to find it again later!

It

’s all divisions

It s all divisions…•^

One way to figure out which cache block a particularmemory address should go to is to use the modmemory address should go to is to use the mod(remainder) operator.

-^

If the cache contains 2

k

blocks, then the data at

0 MemoryAddress

memory address

i

would

go to cache block index

i^ mod 2

k^

Index

0 1 2 3 4

i^ mod

-^

For instance, with thefour-block cache here,

dd

ld

0 1 2 Index

4 5 6 7 8

address 14 would mapto cache block 2.

14 mod 4 = 2

3

(^910111213)

20

131415

Docsity.com

or least

-significant bits

…or least significant bits•^

An equivalent way to find the placement of a memory

dd

i^

th

h

i^

t^

l^

k^

t th

l^

t^

i^

ifi

t^

k^

bit

address in the cache is to look at the least significant

k

bits

of the address.

-^

With our four-byte cachewe would inspect the two

00

00 MemoryAddress

we

would inspect the two

least significant bits ofour memory addresses.

-^

Again, you can see that

I d

00

00 0001001000110100

g

y

address 14 (1110 in binary)maps to cache block 2(10 in binary).

-^

Taking the least

k

bits of

Index^000110

01000101011001111000

-^

Taking

the least

k

bits of

a binary value is the sameas computing that valuemod 2

k.

11

1001101010111100

21

110111101111

Docsity.com