Lecture 13 The Memory Hierarchy, Lecture notes of Geometry

Static RAM (SRAM) See Lecture 7B CA2 page 12. Each cell stores bit with a six-transistor circuit. Retains value indefinitely, as long as it is kept powered.

Typology: Lecture notes

2022/2023

Uploaded on 02/28/2023

markzck
markzck 🇺🇸

4.2

(10)

253 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 13
The Memory Hierarchy
Topics
Topics
Storage technologies and trends
Locality of reference
Caching in the memory hierarchy
F13 – 2 – Datorarkitektur 20 09
Random-Access Memory (RAM)
Key features
Key features
RAM is packaged as a chip.
Basic storage unit is a cell (one bit per cell).
Multiple RAM chips form a memory.
Static RAM (
Static RAM (SRAM
SRAM)
) See Lecture 7B CA2 page 12
See Lecture 7B CA2 page 12
Each cell stores bit with a six-transistor circuit.
Retains value indefinitely, as long as it is kept powered.
Relatively insensitive to disturbances such as electrical noise.
Faster and more expensive than DRAM.
Dynamic RAM (
Dynamic RAM (DRAM
DRAM)
)
Each cell stores bit with a capacitor and transistor.
Value must be refreshed every 10-100 ms.
Sensitive to disturbances.
Slower and cheaper than SRAM.
F13 – 3 – Datorarkitektu r 2009
SRAM vs DRAM Summary
Tran. Access
per bit time Persist? Sensitive? Cost Applications
SRAM 6 1X Yes No 100X cache memories
DRAM 1 10X No Yes 1X Main memories,
frame buffers
F13 – 4 – Datorarkitektur 20 09
Conventional DRAM Organization
d x w DRAM:
d x w DRAM:
dw total bits organized as d supercells of size w bits
cols
rows
01 2 3
0
1
2
3
internal row buffer
16 x 8 DRAM chip
addr
data
supercell
(2,1)
2 bits
/
8 bits
/
memory
controller
(to CPU)
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Lecture 13 The Memory Hierarchy and more Lecture notes Geometry in PDF only on Docsity!

Lecture 13

The Memory Hierarchy

Topics Topics

Storage technologies and trends

Locality of reference

Caching in the memory hierarchy

F13 – 2 –

Datorarkitektur 2009

Random-Access Memory (RAM) Key featuresKey features



RAM

is packaged as a chip.



Basic storage unit is a

cell

(one bit per cell).



Multiple RAM chips form a memory.

Static RAM (^ Static RAM (

SRAMSRAM

)^ )

See Lecture 7B CA2 page 12See Lecture 7B CA2 page 12



Each cell stores bit with a six-transistor circuit.



Retains value indefinitely, as long as it is kept powered.



Relatively insensitive to disturbances such as electrical noise.



Faster and more expensive than DRAM.

Dynamic RAM (Dynamic RAM (

DRAMDRAM



Each cell stores bit with a capacitor and transistor.



Value must be refreshed every 10-100 ms.



Sensitive to disturbances.



Slower and cheaper than SRAM.

F13 – 3 –

Datorarkitektur 2009

SRAM vs DRAM Summary

Tran.

Access

per bit

time

Persist? Sensitive?

Cost

Applications

SRAM

1X

Yes

No

100X

cache memories

DRAM

10X

No

Yes

1X

Main memories,frame buffers

F13 – 4 –

Datorarkitektur 2009

Conventional DRAM Organizationd x w DRAM: d x w DRAM:

dw total bits organized as d

supercells

of size w bits

cols

rows

internal row buffer

16 x 8 DRAM chip

addr data

supercell

2 bits

/ 8 bits

/

memory controller

(to CPU)

Datorarkitektur 2009

Reading DRAM Supercell (2,1)

Step 1(a): Row access strobe (^ Step 1(a): Row access strobe (

RASRAS

) selects row 2.) selects row 2.

cols

rows

RAS = 2

internal row buffer

16 x 8 DRAM chip

addr data

2 / 8 /

memorycontroller

Step 1(b): Row 2 copied from DRAM array to row buffer.^ Step 1(b): Row 2 copied from DRAM array to row buffer.

F13 – 6 –

Datorarkitektur 2009

Reading DRAM Supercell (2,1) Step 2(a): Column access strobe (Step 2(a): Column access strobe (

CASCAS

) selects column 1.) selects column 1.

internal buffer

cols

rows

internal row buffer

16 x 8 DRAM chip

CAS

addr data

2 / 8 /

memorycontroller

Step 2(b): Supercell (2,1) copied from buffer to data lines, and Step 2(b): Supercell (2,1) copied from buffer to data lines, and

eventually back to the CPU.^ eventually back to the CPU.

supercell

supercell

To CPU

Datorarkitektur 2009

Memory Modules

: supercell (i,j)

64 MBmemory moduleconsisting ofeight 8Mx8 DRAMs

addr (row = i, col

= j)

Memorycontroller

DRAM 7

DRAM 0

0

31

7 8

15 16

23 24

32

63

39 40

47 48

55 56

64-bit doubleword at main memory address

bits0-7^ A

bits8-

bits16-

bits24-

bits32-

bits40-

bits48-

bits56-

64-bit doubleword

0

31

7 8

15 16

23 24

32

63

39 40

47 48

55 56

64-bit doubleword at main memory address

A

F13 – 8 –

Datorarkitektur 2009

Enhanced DRAMs All enhanced DRAMs are built around the conventionalAll enhanced DRAMs are built around the conventional

DRAM core. DRAM core.

Fast page mode DRAM (

FPM DRAM

Access contents of row with [RAS, CAS, CAS, CAS, CAS] instead of[(RAS,CAS), (RAS,CAS), (RAS,CAS), (RAS,CAS)].

Extended data out DRAM (

EDO DRAM

Enhanced FPM DRAM with more closely spaced CAS signals.

Synchronous DRAM (

SDRAM)

Driven with rising clock edge instead of asynchronous control signals.

Double data-rate synchronous DRAM (

DDR SDRAM

Enhancement of SDRAM that uses both clock edges as controlsignals.

Video RAM (

VRAM

Like FPM DRAM, but output is produced by shifting row buffer

Dual ported (allows concurrent reads and writes)

Datorarkitektur 2009

Memory Read Transaction (3)CPU read word x from the bus and copies it into register %eax. CPU read word x from the bus and copies it into register %eax.

x

ALU

register file

bus interface

x

main memory

0 A

%eax

I/O bridge

Load operation:

movl

A, %eax

F13 – 14 –

Datorarkitektur 2009

Memory Write Transaction (1)

CPU places address A on bus. Main memory reads it and^ CPU places address A on bus. Main memory reads it and

waits for the corresponding data word to arrive. waits for the corresponding data word to arrive.

y

ALU

register file

bus interface

A

main memory

0 A

%eax

I/O bridge

Store operation:

movl

%eax,

A

Datorarkitektur 2009

Memory Write Transaction (2)CPU places data word y on the bus.^ CPU places data word y on the bus.

y

ALU

register file

bus interface

y

main memory

0 A

%eax

I/O bridge

Store operation:

movl %eax, A

F13 – 16 –

Datorarkitektur 2009

Memory Write Transaction (3)

Main memory read data word y from the bus and stores it at^ Main memory read data word y from the bus and stores it at

address A. address A.

y

ALU

register file

bus interface

y

main memory

0 A

%eax

I/O bridge

Store operation:

movl

%eax,

A

Datorarkitektur 2009

Disk GeometryDisks consist of Disks consist of

plattersplatters

, each with two, each with two

surfacessurfaces

Each surface consists of concentric rings called Each surface consists of concentric rings called

trackstracks

Each track consists ofEach track consists of

sectorssectors

separated byseparated by

gapsgaps

surface^ spindle

tracks

track

k

sectors

gaps

F13 – 18 –

Datorarkitektur 2009

Disk Geometry (Muliple-Platter View)

Aligned tracks form a cylinder.^ Aligned tracks form a cylinder.

surface 0 surface 1 surface 2 surface 3 surface 4 surface 5

cylinder

k

spindle

platter 0 platter 1 platter 2

Datorarkitektur 2009

Disk CapacityCapacity: Capacity:

maximum number of bits that can be stored.maximum number of bits that can be stored.



Vendors express capacity in units of gigabytes (GB), where 1 GB = 10^9.

Capacity is determined by these technology factors:Capacity is determined by these technology factors:



Recording density

(bits/in): number of bits that can be squeezed into a 1

inch segment of a track.



Track density

(tracks/in): number of tracks that can be squeezed into a 1

inch radial segment.



Areal density

(bits/in2): product of recording and track density.

Modern disks partition tracks into disjoint subsets calledModern disks partition tracks into disjoint subsets called

recordingrecording

zones zones



Each track in a zone has the same number of sectors, determined by thecircumference of innermost track.



Each zone has a different number of sectors/track

F13 – 20 –

Datorarkitektur 2009

Computing Disk Capacity

Capacity =Capacity =

(# bytes/sector) x (avg. # sectors/track) x(# bytes/sector) x (avg. # sectors/track) x(# tracks/surface) x (# surfaces/platter) x (# tracks/surface) x (# surfaces/platter) x (# platters/disk)(# platters/disk)

Example:Example:



512 bytes/sector



300 sectors/track (on average)



20,000 tracks/surface



2 surfaces/platter



5 platters/disk

Capacity = 512 x 300 x 20000 x 2 x 5Capacity = 512 x 300 x 20000 x 2 x 5

= 30,720,000,000= 30,720,000,000= 30.72 GB = 30.72 GB

Datorarkitektur 2009

Logical Disk Blocks Modern disks present a simpler abstract view of the complexModern disks present a simpler abstract view of the complex

sector geometry:^ sector geometry:

The set of available sectors is modeled as a sequence of b-sized logical blocks

Mapping between logical blocks and actual (physical)Mapping between logical blocks and actual (physical)

sectorssectors

Maintained by hardware/firmware device called disk controller.

Converts requests for logical blocks into (surface,track,sector)triples.

Allows controller to set aside spare cylinders for each zone. Allows controller to set aside spare cylinders for each zone.

Accounts for the difference in “

formatted capacity

” and “

maximum

capacity

F13 – 26 –

Datorarkitektur 2009

I/O Bus

main memory

I/O

bridge

bus interface

ALU

register file

CPU chip

system bus

memory bus

disk

controller

graphicsadapter

USB

controller

mouse

keyboard

monitor

disk

I/O bus

Expansion slots forother devices suchas network adapters.

Datorarkitektur 2009

Reading a Disk Sector (1)

main memory

ALU

register file

CPU chip

disk

controller

graphicsadapter

USB

controller

mouse

keyboard

monitor

disk

I/O bus

bus interface

CPU initiates a disk read by writing acommand, logical block number, anddestination memory address to a

port

(address) associated with disk controller.

F13 – 28 –

Datorarkitektur 2009

Reading a Disk Sector (2)

main memory

ALU

register file

CPU chip

disk

controller

graphicsadapter

USB

controller

mouse

keyboard

monitor

disk

I/O bus

bus interface

Disk controller reads the sector and performsa direct memory access (

DMA

) transfer into

main memory.

Datorarkitektur 2009

Reading a Disk Sector (3)

main memory

ALU

register file

CPU chip

disk

controller

graphicsadapter

USB

controller

mouse

keyboard

monitor

disk

I/O bus

bus interface

When the DMA transfer completes, the diskcontroller notifies the CPU with an

interrupt

(i.e., asserts a special “interrupt” pin on theCPU)

F13 – 30 –

Datorarkitektur 2009

Storage Trends

metric

$/MB

access (ns)

typical size(MB)

DRAM

metric

$/MB

access (ns)

SRAM

metric

$/MB

access (ms)

typical size(MB)

Disk

Datorarkitektur 2009

CPU Clock Rates

processor

Pent

P-III

clock rate(MHz)

cycle time(ns)

F13 – 32 –

Datorarkitektur 2009

The CPU-Memory Gap

The increasing gap between DRAM, disk, and CPU speeds.^ The increasing gap between DRAM, disk, and CPU speeds.

year

ns

Disk seek time DRAM access time SRAM access time CPU cycle time

Datorarkitektur 2009

Memory HierarchiesSome fundamental and enduring properties of hardware and Some fundamental and enduring properties of hardware and

software: software:

Fast storage technologies cost more per byte and have lesscapacity.

The gap between CPU and main memory speed is widening.

Well-written programs tend to exhibit good locality.

These fundamental properties complement each otherThese fundamental properties complement each other

beautifully.beautifully.

They suggest an approach for organizing memory and They suggest an approach for organizing memory and

storage systems known as a storage systems known as a

memory hierarchymemory hierarchy

F13 – 38 –

Datorarkitektur 2009

An Example Memory Hierarchy

registers on-chip L cache (SRAM)^ main memory

(DRAM)

local secondary storage

(local disks)

Larger,slower,

and cheaper(per byte)storagedevices

remote secondary storage

(distributed file systems, Web servers)

Local disks hold filesretrieved from disks onremote network servers.

Main memory holds diskblocks retrieved from localdisks.

off-chip L cache (SRAM)

L1 cache holds cache lines retrievedfrom the L2 cache memory.

CPU registers hold words retrieved fromL1 cache.

L2 cache holds cache lines retrievedfrom main memory.

L0:

L1:

L2:

L3:

L4:

L5:

Smaller,

faster,

and costlier (per byte)

storagedevices

Datorarkitektur 2009

Caches

Cache:Cache:

A smaller, faster storage device that acts as aA smaller, faster storage device that acts as a

staging area for a subset of the data in a larger, slower^ staging area for a subset of the data in a larger, slowerdevice.^ device.

Fundamental idea of a memory hierarchy:Fundamental idea of a memory hierarchy:

For each k, the faster, smaller device at level k serves as a cachefor the larger, slower device at level k+1.

Why do memory hierarchies work? Why do memory hierarchies work?

Programs tend to access the data at level k more often than theyaccess the data at level k+1.

Thus, the storage at level k+1 can be slower, and thus larger andcheaper per bit.

Net effect: A large pool of memory that costs as much as thecheap storage near the bottom, but that serves data to programs atthe rate of the fast storage near the top.

F13 – 40 –

Datorarkitektur 2009

Caching in a Memory Hierarchy

Larger, slower, cheaper storagedevice at level k+1 is partitionedinto blocks.

Data is copied betweenlevels in block-sized transfer units

Smaller, faster, more expensivedevice at level k caches asubset of the blocks from level k+

Level k:

Level k+1:

Datorarkitektur 2009

Request

Request

General Caching Concepts

Program needs object d, which is stored inProgram needs object d, which is stored in

some block b.some block b.

Cache hit^ Cache hit



Program finds b in the cache at level k.E.g., block 14.

Cache miss^ Cache miss



b is not at level k, so level k cache mustfetch it from level k+1.

E.g., block 12.



If level k cache is full, then some currentblock must be replaced (evicted). Whichone is the “victim”?



Placement policy:

where can the new block

go? E.g., b mod 4



Replacement policy:

which block should be

evicted? E.g., LRU

Level

k:

Level

k+1:

0

1

2

3

Request

F13 – 42 –

Datorarkitektur 2009

General Caching ConceptsTypes of cache misses: Types of cache misses:

Cold (compulsary) miss

Cold misses occur because the cache is empty.

Conflict miss

Most caches limit blocks at level k+1 to a small subset (sometimes asingleton) of the block positions at level k.

E.g. Block i at level k+1 must be placed in block (i mod 4) at level k.

Conflict misses occur when the level k cache is large enough, butmultiple data objects all map to the same level k block.

E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time.

Capacity miss

Occurs when the set of active cache blocks (working set) is larger thanthe cache.

Datorarkitektur 2009

Examples of Caching in the Hierarchy

Hardware

On-Chip TLB

Addresstranslations

TLB

Webbrowser

Local disk

Web pages

Cache Type Registers L1 cache L2 cache Virtual Memory Buffer cache Network buffercacheBrowser cache Web cache

What Cached 4-byte word 32-byte block 32-byte block 4-KB page Parts of files Parts of files Web pages

Web proxyserver

Remote serverdisks

OS

Main memory

Hardware

On-Chip L

Hardware

Off-Chip L

AFS/NFSclient

Local disk

Hardware+OS

Main memory

Compiler

CPU registers

Managed By

Latency(cycles)

Where Cached