10-memory-hierarchy-4up.pdf, Study notes of Geometry

Faster and more expensive than DRAM. □ Dynamic RAM (DRAM). ▫ Each cell stores bit with a capacitor. One transistor is used for access.

Typology: Study notes

2022/2023

Uploaded on 03/01/2023

selvam_0p3
selvam_0p3 🇺🇸

4.4

(17)

233 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Carnegie Mellon
TheMemoryHierarchy
15213/18213:IntroductiontoComputerSystems
10th Lecture,Feb.16,2012
Instructors:
ToddC.Mowr
y
&Anthon
y
Rowe
1
y y
Carnegie Mellon
Today
Storagetechnologiesandtrends
Localityofreference
Caching in the memory hierarchy
Caching
in
the
memory
hierarchy
2
Carnegie Mellon
RandomAccessMemory(RAM)
Keyfeatures
RAM istraditionallypackagedasachip.
Bi i i ll
ll
(bi ll)
B
as
i
cstorageun
i
t
i
snorma
ll
yace
ll
(
one
bi
tperce
ll)
.
MultipleRAMchipsformamemory.
StaticRAM(SRAM)
Eachcellstoresabitwithafourorsixtransistorcircuit.
Retainsvalueindefinitely,aslongasitiskeptpowered.
Relativelyinsensitivetoelectricalnoise(EMI),radiation,etc.
Ft d i th DRAM
3
F
as
t
eran
d
moreexpens
i
ve
th
an
DRAM
.
DynamicRAM(DRAM)
Eachcellstoresbitwithacapacitor.Onetransistorisusedforaccess
Valuemustberefreshedevery10100ms.
Moresensitivetodisturbances(EMI,radiation,…)thanSRAM.
SlowerandcheaperthanSRAM.
Carnegie Mellon
SRAMvs DRAMSummary
Trans. Access Needs Needs
per bit time refresh? EDC? Cost Applications
SRAM 4 or 6 1X No Maybe 100x Cache memories
DRAM 1 10X Yes Yes 1X Main memories,
frame buffers
4
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download 10-memory-hierarchy-4up.pdf and more Study notes Geometry in PDF only on Docsity!

Carnegie Mellon

The

Memory

Hierarchy

‐^213

Introduction

to

Computer

Systems

th

Lecture,

Feb.

Instructors: Todd

C.

Mowry

Anthony

Rowe

1

y

y

Carnegie Mellon

Today^ 

Storage

technologies

and

trends

Locality

of

reference

Caching in the memory hierarchy

Caching

in

the

memory

hierarchy

2

Carnegie Mellon

Random

Access

Memory

(RAM)

Key

features

RAM is

traditionally

packaged

as

a

chip.

B

i^

i i

ll^

ll (

bi

ll)

B

asic

storage

unit

i

s^

normally

a

cell (one

bi

t^

per

cell).

Multiple

RAM

chips

form

a

memory.

Static

RAM

(SRAM)

Each

cell

stores

a

bit

with

a

four

or

six

‐transistor

circuit.

Retains

value

indefinitely,

as

long

as

it

is

kept

powered.

Relatively

insensitive

to

electrical

noise

(EMI),

radiation,

etc.

F

t^

d

i^

th

DRAM

3

F

aster

and

more

expensive

th

an

DRAM

Dynamic

RAM

(DRAM)

Each

cell

stores

bit

with

a

capacitor.

One

transistor

is

used

for

access

Value

must

be

refreshed

every

‐^100

ms.

More

sensitive

to

disturbances

(EMI,

radiation,…)

than

SRAM.

Slower

and

cheaper

than

SRAM.

Carnegie Mellon

SRAM

vs DRAM

Summary

Trans.

Access

Needs

Needs

per bit

time

refresh? EDC?

Cost

Applications

SRAM

4 or 6

1X

No

Maybe

100x

Cache memories

DRAM

10X

Yes

Yes

1X

Main memories,frame buffers

4

Carnegie Mellon

Conventional

DRAM

Organization

d x w DRAM:

dw total

bits

organized

as

d supercells of

size

w bits^ cols

rows

0

1

2

3

0 1 2

16 x 8 DRAM chip

addr

ll

2 bits/

Memory

5

2 3

Internal row buffer

data

supercell

(2,1)

8 bits/

Memorycontroller

(to/from CPU)

Carnegie Mellon

Reading

DRAM

Supercell

Step

1(a):

Row

access

strobe

(RAS)

selects

row

Step

1(b):

Row

copied

from

DRAM

array

to

row

buffer.

Step

1(b):

Row

copied

from

DRAM

array

to

row

buffer. Cols

Rows

RAS

=

2

0

1

2

3

0 1 2

16 x 8 DRAM chip

addr

2 /

Memory

6

2

Internal row buffer

3

data

8 /

controller

Carnegie Mellon

Reading

DRAM

Supercell

Step

2(a):

Column

access

strobe

(CAS)

selects

column

Step

2(b):

Step

2(b):

SupercellSupercell (2,1)

copied

from

buffer

to

data

lines,

and

eventually

copied

from

buffer

to

data

lines,

and

eventually

back

to

the

CPU.

back

to

the

CPU.

Cols

Rows

0

1

2

3

0 1 2

16 x 8 DRAM chip

CAS

=^

1

addr

2 /

Memory

To CPU

7

2 3

Internal row buffer

data

8 /

controller

supercell

(2,1)

supercell

(2,1)

Carnegie Mellon

Memory

Modules

: supercell (i,j)

addr

(row

=

i,

col

=^

j)

DRAM 0

64 MBmemory moduleconsisting ofeight 8Mx8 DRAMs

DRAM 7

bits0-

bits8-

bits16-

bits24-

bits32-

bits40-

bits48-

bits56-

8

Memorycontroller

0

31

7 8

15 16 23 24

32

63

39 40 47 48 55 56 64-bit doubleword at main memory address

A

64-bit doubleword

0

31

7 8

15 16 23 24

32

63

39 40 47 48 55 56

Carnegie Mellon

Memory

Read

Transaction

Main

memory

reads

A

from

the

memory

bus,

retrieves

word

x,

and

places

it

on

the

bus.

ALU

Register file

Bus interface

x^

0 A

x

Main memory

%eax

I/O bridge

Load operation:

movl

A,

%eax

13

Carnegie Mellon

Memory

Read

Transaction

CPU

read

word

x from

the

bus

and

copies

it

into

register

%eax.

x^

ALU

Register file

Bus interface

x

Main memory

0 A

%eax

I/O bridge

Load operation:

movl

A,

%eax

14

Carnegie Mellon

Memory

Write

Transaction

CPU

places

address

A

on

bus.

Main

memory

reads

it

and

waits

for

the

corresponding

data

word

to

arrive.

y^

ALU

Register file

Bus interface

A

Main memory

0 A

%eax

I/O bridge

Store operation:

movl

%eax,

A

15

Carnegie Mellon

Memory

Write

Transaction

CPU

places

data

word

y on

the

bus.

y^

ALU

Register file

Bus interface

y

Main memory

(^0) A

%eax

I/O bridge

Store operation:

movl

%eax,

A

16

Carnegie Mellon

Memory

Write

Transaction

Main

memory

reads

data

word

y from

the

bus

and

stores

it

at

address

A.

y^

ALU

register file

bus interface

y

main memory

0 A

%eax

I/O bridge

Store operation:

movl

%eax,

A

17

Carnegie Mellon

What’s

Inside

A

Disk

Drive?

Spindle

Arm

Platters

Actuator

Electronics

18

(including aprocessorand memory!)

SCSI

connector

Image courtesy of Seagate Technology

Carnegie Mellon

Disk

Geometry

Disks

consist

of

platters,

each

with

two

surfaces.

Each

surface

consists

of

concentric

rings

called

tracks.

Each

track

consists

of

sectors separated

by

gaps.

Surface

Tracks

Track

k

Gaps

19

Spindle

Sectors

Carnegie Mellon

Disk

Geometry

(Muliple

Platter

View)

Aligned

tracks

form

a

cylinder.

Cylinder

k

Surface 0Surface 1Surface 2Surface 3Surface 4Surface 5

Platter 0Platter 1Platter 2

20

Spindle

Carnegie Mellon

Disk

Structure

top

view

of

single

platter

Surface organized into tracksTracks divided into sectors

g

25

Carnegie Mellon

Disk

Access

26

Head in position above a track

Carnegie Mellon

Disk

Access

27

Rotation is counter-clockwise

Carnegie Mellon

Disk

Access

– Read

28

About to read blue sector

Carnegie Mellon

Disk

Access

– Read

After BLUE read

29

After reading blue sector

Carnegie Mellon

Disk

Access

– Read

After BLUE read

30

Red request scheduled next

Carnegie Mellon

Disk

Access

– Seek

After BLUE read

Seek for RED

31

Seek to red’s track

Carnegie Mellon

Disk

Access

– Rotational

Latency

After BLUE read

Seek for RED

Rotational latency

32

Wait for red sector to rotate around

Carnegie Mellon

Logical

Disk

Blocks

Modern

disks

present

a

simpler

abstract

view

of

the

complex

sector

geometry:

Th

f^

il bl

i^

d l d

f^

b

i^

d

Th

e

set

of

available

sectors

i

s^

modeled

as

a

sequence

of

b

‐sized

logical

blocks

Mapping

between

logical

blocks

and

actual

(physical)

sectors

Maintained

by

hardware/firmware

device

called

disk

controller.

Converts

requests

for

logical

blocks

into

(surface,track,sector)

triples.

37

triples.

Allows

controller

to

set

aside

spare

cylinders

for

each

zone.

Accounts

for

the

difference

in

“formatted

capacity”

and

“maximum

capacity”.

Carnegie Mellon

I/O

Bus

ALU

Register file

CPU chip

Main memory

I/O bridge

Bus interface

System bus

Memory bus

38

Disk controller

Graphicsadapter

USB controller Mouse

Keyboard

Monitor

Disk

I/O bus

Expansion slots forother devices suchas network adapters.

Carnegie Mellon

Reading

a

Disk

Sector

ALU

Register file

CPU chip

CPU initiates a disk read by writing acommand, logical block number, anddestination memory address to a port

Main memory

ALU

I/O bus

Bus interface

destination memory address to a

port

(address) associated with disk controller.

39

Disk controller

Graphicsadapter

USB controller mouse

keyboard

Monitor

Disk

Carnegie Mellon

Reading

a

Disk

Sector

A

LU

Register file

CPU chip

Disk controller reads the sector andperforms a direct memory access(DMA) transfer into main memory

Main memory

U

I/O bus

Bus interface

(DMA

) transfer into main memory

40

Disk controller

Graphicsadapter

USB controller Mouse

Keyboard

Monitor

Disk

Carnegie Mellon

Reading

a

Disk

Sector

ALU

Register file

CPU chip

When the DMA transfer completes,the disk controller notifies the CPUwith an

interrupt

(i.e., asserts a

Main memory

ALU

I/O bus

Bus interface

with an

interrupt

(i.e., asserts a

special “interrupt” pin on the CPU)

41

Disk controller

Graphicsadapter

USB controller Mouse

Keyboard

Monitor

Disk

Carnegie Mellon

Solid

State

Disks

(SSDs) I/O bus

Solid State Disk (SSD)

Requests to read andwrite logical disk blocks

Flash

translation layer

Page 0

Page 1

Page P-

Block 0

Page 0

Page 1

Page P-

Block B-

Flash memory

(^

)

42

Pages:

512KB

to

4KB,

Blocks:

to

pages

Data

read/written

in

units

of

pages.

Page

can

be

written

only

after

its

block

has

been

erased

A

block

wears

out

after

repeated

writes.

Carnegie Mellon

SSD

Performance

Characteristics

Sequential

read

tput

MB/s

Sequential

write

tput

MB/s

Random

read

tput

MB/s

Random

write

tput

MB/s

Why

are

random

writes

so

slow?

Erasing

a

block

is

slow

(around

ms)

Write

to

a

page

triggers

a

copy

of

all

useful

pages

in

the

block

^

Find an used block (new block) and erase it

p

/^

p

Rand

read

access

us

Random

write

access

us

43

^

Find

an

used

block

(new

block)

and

erase

it

^

Write

the

page

into

the

new

block

^

Copy

other

pages

from

old

block

to

the

new

block

Carnegie Mellon

SSD

Tradeoffs vs

Rotating

Disks

Advantages

No

moving

parts

faster,

less

power,

more

rugged

Disadvantages

Have

the

potential

to

wear

out

^

Mitigated

by

“wear

leveling

logic”

in

flash

translation

layer

^

E.g.

Intel

X

guarantees

petabyte (

15

bytes)

of

random

writes

before

they

wear

out

In 2010 about 100 times more expensive per byte

44

In

,^

about

times

more

expensive

per

byte

Applications

MP

players,

smart

phones,

laptops

Beginning

to

appear

in

desktops

and

servers

Carnegie Mellon

Today^ 

Storage

technologies

and

trends

Locality

of

reference

Caching in the memory hierarchy

Caching

in

the

memory

hierarchy

49

Carnegie Mellon

Locality^ 

Principle

of

Locality: Programs

tend

to

use

data

and

instructions

with

addresses

near

or

equal

to

those

they

have used recentlyhave

used

recently

Temporal

locality:

Recently

referenced

items

are

likely

to

be

referenced

again

in

the

near

future

50

Spatial

locality:

Items

with

nearby

addresses

tend

to

be

referenced

close

together

in

time

Carnegie Mellon

Locality

Example

sum

=

0;

for

(i =

0;

i <

n;

i++)

sum

+= a[i];

Data

references

Reference

array

elements

in

succession

(stride

‐^1

reference

pattern).

Reference

variable

sum

each

iteration.

I^

t

ti

f

sum

+= a[i];

return sum;

Spatial

locality

Temporal

locality

51

Instruction

references

Reference

instructions

in

sequence.

Cycle

through

loop

repeatedly.

Spatial

locality

Temporal

locality

Carnegie Mellon

Qualitative

Estimates

of

Locality

Claim: Being

able

to

look

at

code

and

get

a

qualitative

sense

of

its

locality

is

a

key

skill

for

a

professional

programmerprogrammer

Question: Does

this

function

have

good

locality

with

respect

to

array

a

int sum_array_rows(int a[M][N]){

52

{

int i, j, sum = 0;for (i = 0; i < M; i++)

for (j = 0; j < N; j++)

sum += a[i][j];

return sum;

}

Carnegie Mellon

Locality

Example

Question: Does

this

function

have

good

locality

with

respect

to

array

a

int sum_array_cols(int a[M][N]){

int i, j, sum = 0;for (j = 0; j < N; j++)

for (i = 0; i < M; i++)

sum += a[i][j];

return sum;

53

return

sum;

}

Carnegie Mellon

Locality

Example

Question:

Can

you

permute

the

loops

so

that

the

function

scans

the

‐ d

array

a

with

a

stride

‐^1

reference

pattern

(and thus has good spatial locality)?(and

thus

has

good

spatial

locality)?

int sum_array_3d(int a[M][N][N]){

int i, j, k, sum = 0;for (i = 0; i < M; i++)

for (j = 0; j < N; j++)

54

for

(j = 0; j < N; j++)for (k = 0; k < N; k++)

sum += a[k][i][j];

return sum;

}

Carnegie Mellon

Memory

Hierarchies

Some

fundamental

and

enduring

properties

of

hardware

and

software: F^

h

l^

i^

b

h

l^

i

F

ast

storage

technologies

cost

more

per

b

yte,

h

ave

less

capacity,

and

require

more

power

(heat!).

The

gap

between

CPU

and

main

memory

speed

is

widening.

Well

‐written

programs

tend

to

exhibit

good

locality.

These

fundamental

properties

complement

each

other

beautifully

55

beautifully

They

suggest

an

approach

for

organizing

memory

and

storage

systems

known

as

a

memory

hierarchy.

Carnegie Mellon

Today^ 

Storage

technologies

and

trends

Locality

of

reference

Caching in the memory hierarchy

Caching

in

the

memory

hierarchy

56

Carnegie Mellon

General

Cache

Concepts:

Miss

Data

in

block

b

is

needed

Request:

12

0

1

2

3

8

9

14

3

Cache

Memory

Block

b

is

not

in

cache:

Miss!Block

b

is

fetched

from

memory

Request:

12

12

12

Block

b

is

stored

in

cache

61

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

Memory

12

-^ Placement

policy:

determines

where

b

goes

-^ Replacement

policy:

determines

which

block

gets

evicted

(victim)

Carnegie Mellon

General

Caching

Concepts:

Types

of

Cache

Misses

Cold

(compulsory)

miss

C ld

i^

b

h

h

i

C

old

misses

occur

b

ecause

the

cache

i

s^

empty.

Conflict

miss

Most

caches

limit

blocks

at

level

k+

to

a

small

subset

(sometimes

a

singleton)

of

the

block

positions

at

level

k.

^

E.g.

Block

i at

level

k+

must

be

placed

in

block

(i mod

at

level

k.

Conflict

misses

occur

when

the

level

k cache

is

large

enough,

but

multiple

data objects all map to the same level

k

block.

62

data

objects

all

map

to

the

same

level

k

block.

^

E.g.

Referencing

blocks

would

miss

every

time.

Capacity

miss

Occurs

when

the

set

of

active

cache

blocks

(working

set)

is

larger

than

the

cache.

Carnegie Mellon

Examples

of

Caching

in

the

Hierarchy

CacheRegisters

Type

4

8 bytes words What

is

Cached?

Compiler 0

CPU core

Managed

By

Latency

(cycles)

Where

is

it

Cached?

Hardware 0

On

Chip

TLB

Address

translations

TLB Buffer

cache

Virtual

Memory

L

cache L

cache Registers

Parts

of

files

4

KB

page

64

bytes

block

64

bytes

block

4 8

bytes

words

OS

100

Main

memory

Hardware 1

On

Chip

L

Hardware

10

On/Off

Chip

L

Hardware

OS

100

Main

memory

Compiler 0

CPU

core

Disk cache

Disk sectors

Disk controller

100 000

Disk firmware

63

Web

browser

10,000,

Local

disk

Web

pages

Browser

cache

Web

cache

Network

buffer

cache

Web

pages

Parts

of

files

Web

proxy

server

1,000,000,

Remote

server

disks

AFS/NFS

client

10,000,

Local

disk

Disk

cache

Disk

sectors

Disk

controller

100

,

Disk

firmware

Carnegie Mellon

Summary^ 

The

speed

gap

between

CPU,

memory

and

mass

storage

continues

to

widen.

Well

‐ written

programs

exhibit

a

property

called

locality.

Memory

hierarchies

based

on

caching

close

the

gap

by

exploiting

locality.

64