Download Lecture 13 The Memory Hierarchy and more Lecture notes Geometry in PDF only on Docsity!
Lecture 13
The Memory Hierarchy
Topics Topics
Storage technologies and trends
Locality of reference
Caching in the memory hierarchy
F13 – 2 –
Datorarkitektur 2009
Random-Access Memory (RAM) Key featuresKey features
RAM
is packaged as a chip.
Basic storage unit is a
cell
(one bit per cell).
Multiple RAM chips form a memory.
Static RAM (^ Static RAM (
SRAMSRAM
)^ )
See Lecture 7B CA2 page 12See Lecture 7B CA2 page 12
Each cell stores bit with a six-transistor circuit.
Retains value indefinitely, as long as it is kept powered.
Relatively insensitive to disturbances such as electrical noise.
Faster and more expensive than DRAM.
Dynamic RAM (Dynamic RAM (
DRAMDRAM
Each cell stores bit with a capacitor and transistor.
Value must be refreshed every 10-100 ms.
Sensitive to disturbances.
Slower and cheaper than SRAM.
F13 – 3 –
Datorarkitektur 2009
SRAM vs DRAM Summary
Tran.
Access
per bit
time
Persist? Sensitive?
Cost
Applications
SRAM
1X
Yes
No
100X
cache memories
DRAM
10X
No
Yes
1X
Main memories,frame buffers
F13 – 4 –
Datorarkitektur 2009
Conventional DRAM Organizationd x w DRAM: d x w DRAM:
dw total bits organized as d
supercells
of size w bits
cols
rows
internal row buffer
16 x 8 DRAM chip
addr data
supercell
2 bits
/ 8 bits
/
memory controller
(to CPU)
Datorarkitektur 2009
Reading DRAM Supercell (2,1)
Step 1(a): Row access strobe (^ Step 1(a): Row access strobe (
RASRAS
) selects row 2.) selects row 2.
cols
rows
RAS = 2
internal row buffer
16 x 8 DRAM chip
addr data
2 / 8 /
memorycontroller
Step 1(b): Row 2 copied from DRAM array to row buffer.^ Step 1(b): Row 2 copied from DRAM array to row buffer.
F13 – 6 –
Datorarkitektur 2009
Reading DRAM Supercell (2,1) Step 2(a): Column access strobe (Step 2(a): Column access strobe (
CASCAS
) selects column 1.) selects column 1.
internal buffer
cols
rows
internal row buffer
16 x 8 DRAM chip
CAS
addr data
2 / 8 /
memorycontroller
Step 2(b): Supercell (2,1) copied from buffer to data lines, and Step 2(b): Supercell (2,1) copied from buffer to data lines, and
eventually back to the CPU.^ eventually back to the CPU.
supercell
supercell
To CPU
Datorarkitektur 2009
Memory Modules
: supercell (i,j)
64 MBmemory moduleconsisting ofeight 8Mx8 DRAMs
addr (row = i, col
= j)
Memorycontroller
DRAM 7
DRAM 0
0
31
7 8
15 16
23 24
32
63
39 40
47 48
55 56
64-bit doubleword at main memory address
bits0-7^ A
bits8-
bits16-
bits24-
bits32-
bits40-
bits48-
bits56-
64-bit doubleword
0
31
7 8
15 16
23 24
32
63
39 40
47 48
55 56
64-bit doubleword at main memory address
A
F13 – 8 –
Datorarkitektur 2009
Enhanced DRAMs All enhanced DRAMs are built around the conventionalAll enhanced DRAMs are built around the conventional
DRAM core. DRAM core.
Fast page mode DRAM (
FPM DRAM
Access contents of row with [RAS, CAS, CAS, CAS, CAS] instead of[(RAS,CAS), (RAS,CAS), (RAS,CAS), (RAS,CAS)].
Extended data out DRAM (
EDO DRAM
Enhanced FPM DRAM with more closely spaced CAS signals.
Synchronous DRAM (
SDRAM)
Driven with rising clock edge instead of asynchronous control signals.
Double data-rate synchronous DRAM (
DDR SDRAM
Enhancement of SDRAM that uses both clock edges as controlsignals.
Video RAM (
VRAM
Like FPM DRAM, but output is produced by shifting row buffer
Dual ported (allows concurrent reads and writes)
Datorarkitektur 2009
Memory Read Transaction (3)CPU read word x from the bus and copies it into register %eax. CPU read word x from the bus and copies it into register %eax.
x
ALU
register file
bus interface
x
main memory
0 A
%eax
I/O bridge
Load operation:
movl
A, %eax
F13 – 14 –
Datorarkitektur 2009
Memory Write Transaction (1)
CPU places address A on bus. Main memory reads it and^ CPU places address A on bus. Main memory reads it and
waits for the corresponding data word to arrive. waits for the corresponding data word to arrive.
y
ALU
register file
bus interface
A
main memory
0 A
%eax
I/O bridge
Store operation:
movl
%eax,
A
Datorarkitektur 2009
Memory Write Transaction (2)CPU places data word y on the bus.^ CPU places data word y on the bus.
y
ALU
register file
bus interface
y
main memory
0 A
%eax
I/O bridge
Store operation:
movl %eax, A
F13 – 16 –
Datorarkitektur 2009
Memory Write Transaction (3)
Main memory read data word y from the bus and stores it at^ Main memory read data word y from the bus and stores it at
address A. address A.
y
ALU
register file
bus interface
y
main memory
0 A
%eax
I/O bridge
Store operation:
movl
%eax,
A
Datorarkitektur 2009
Disk GeometryDisks consist of Disks consist of
plattersplatters
, each with two, each with two
surfacessurfaces
Each surface consists of concentric rings called Each surface consists of concentric rings called
trackstracks
Each track consists ofEach track consists of
sectorssectors
separated byseparated by
gapsgaps
surface^ spindle
tracks
track
k
sectors
gaps
F13 – 18 –
Datorarkitektur 2009
Disk Geometry (Muliple-Platter View)
Aligned tracks form a cylinder.^ Aligned tracks form a cylinder.
surface 0 surface 1 surface 2 surface 3 surface 4 surface 5
cylinder
k
spindle
platter 0 platter 1 platter 2
Datorarkitektur 2009
Disk CapacityCapacity: Capacity:
maximum number of bits that can be stored.maximum number of bits that can be stored.
Vendors express capacity in units of gigabytes (GB), where 1 GB = 10^9.
Capacity is determined by these technology factors:Capacity is determined by these technology factors:
Recording density
(bits/in): number of bits that can be squeezed into a 1
inch segment of a track.
Track density
(tracks/in): number of tracks that can be squeezed into a 1
inch radial segment.
Areal density
(bits/in2): product of recording and track density.
Modern disks partition tracks into disjoint subsets calledModern disks partition tracks into disjoint subsets called
recordingrecording
zones zones
Each track in a zone has the same number of sectors, determined by thecircumference of innermost track.
Each zone has a different number of sectors/track
F13 – 20 –
Datorarkitektur 2009
Computing Disk Capacity
Capacity =Capacity =
(# bytes/sector) x (avg. # sectors/track) x(# bytes/sector) x (avg. # sectors/track) x(# tracks/surface) x (# surfaces/platter) x (# tracks/surface) x (# surfaces/platter) x (# platters/disk)(# platters/disk)
Example:Example:
512 bytes/sector
300 sectors/track (on average)
20,000 tracks/surface
2 surfaces/platter
5 platters/disk
Capacity = 512 x 300 x 20000 x 2 x 5Capacity = 512 x 300 x 20000 x 2 x 5
= 30,720,000,000= 30,720,000,000= 30.72 GB = 30.72 GB
Datorarkitektur 2009
Logical Disk Blocks Modern disks present a simpler abstract view of the complexModern disks present a simpler abstract view of the complex
sector geometry:^ sector geometry:
The set of available sectors is modeled as a sequence of b-sized logical blocks
Mapping between logical blocks and actual (physical)Mapping between logical blocks and actual (physical)
sectorssectors
Maintained by hardware/firmware device called disk controller.
Converts requests for logical blocks into (surface,track,sector)triples.
Allows controller to set aside spare cylinders for each zone. Allows controller to set aside spare cylinders for each zone.
Accounts for the difference in “
formatted capacity
” and “
maximum
capacity
F13 – 26 –
Datorarkitektur 2009
I/O Bus
main memory
I/O
bridge
bus interface
ALU
register file
CPU chip
system bus
memory bus
disk
controller
graphicsadapter
USB
controller
mouse
keyboard
monitor
disk
I/O bus
Expansion slots forother devices suchas network adapters.
Datorarkitektur 2009
Reading a Disk Sector (1)
main memory
ALU
register file
CPU chip
disk
controller
graphicsadapter
USB
controller
mouse
keyboard
monitor
disk
I/O bus
bus interface
CPU initiates a disk read by writing acommand, logical block number, anddestination memory address to a
port
(address) associated with disk controller.
F13 – 28 –
Datorarkitektur 2009
Reading a Disk Sector (2)
main memory
ALU
register file
CPU chip
disk
controller
graphicsadapter
USB
controller
mouse
keyboard
monitor
disk
I/O bus
bus interface
Disk controller reads the sector and performsa direct memory access (
DMA
) transfer into
main memory.
Datorarkitektur 2009
Reading a Disk Sector (3)
main memory
ALU
register file
CPU chip
disk
controller
graphicsadapter
USB
controller
mouse
keyboard
monitor
disk
I/O bus
bus interface
When the DMA transfer completes, the diskcontroller notifies the CPU with an
interrupt
(i.e., asserts a special “interrupt” pin on theCPU)
F13 – 30 –
Datorarkitektur 2009
Storage Trends
metric
$/MB
access (ns)
typical size(MB)
DRAM
metric
$/MB
access (ns)
SRAM
metric
$/MB
access (ms)
typical size(MB)
Disk
Datorarkitektur 2009
CPU Clock Rates
processor
Pent
P-III
clock rate(MHz)
cycle time(ns)
F13 – 32 –
Datorarkitektur 2009
The CPU-Memory Gap
The increasing gap between DRAM, disk, and CPU speeds.^ The increasing gap between DRAM, disk, and CPU speeds.
year
ns
Disk seek time DRAM access time SRAM access time CPU cycle time
Datorarkitektur 2009
Memory HierarchiesSome fundamental and enduring properties of hardware and Some fundamental and enduring properties of hardware and
software: software:
Fast storage technologies cost more per byte and have lesscapacity.
The gap between CPU and main memory speed is widening.
Well-written programs tend to exhibit good locality.
These fundamental properties complement each otherThese fundamental properties complement each other
beautifully.beautifully.
They suggest an approach for organizing memory and They suggest an approach for organizing memory and
storage systems known as a storage systems known as a
memory hierarchymemory hierarchy
F13 – 38 –
Datorarkitektur 2009
An Example Memory Hierarchy
registers on-chip L cache (SRAM)^ main memory
(DRAM)
local secondary storage
(local disks)
Larger,slower,
and cheaper(per byte)storagedevices
remote secondary storage
(distributed file systems, Web servers)
Local disks hold filesretrieved from disks onremote network servers.
Main memory holds diskblocks retrieved from localdisks.
off-chip L cache (SRAM)
L1 cache holds cache lines retrievedfrom the L2 cache memory.
CPU registers hold words retrieved fromL1 cache.
L2 cache holds cache lines retrievedfrom main memory.
L0:
L1:
L2:
L3:
L4:
L5:
Smaller,
faster,
and costlier (per byte)
storagedevices
Datorarkitektur 2009
Caches
Cache:Cache:
A smaller, faster storage device that acts as aA smaller, faster storage device that acts as a
staging area for a subset of the data in a larger, slower^ staging area for a subset of the data in a larger, slowerdevice.^ device.
Fundamental idea of a memory hierarchy:Fundamental idea of a memory hierarchy:
For each k, the faster, smaller device at level k serves as a cachefor the larger, slower device at level k+1.
Why do memory hierarchies work? Why do memory hierarchies work?
Programs tend to access the data at level k more often than theyaccess the data at level k+1.
Thus, the storage at level k+1 can be slower, and thus larger andcheaper per bit.
Net effect: A large pool of memory that costs as much as thecheap storage near the bottom, but that serves data to programs atthe rate of the fast storage near the top.
F13 – 40 –
Datorarkitektur 2009
Caching in a Memory Hierarchy
Larger, slower, cheaper storagedevice at level k+1 is partitionedinto blocks.
Data is copied betweenlevels in block-sized transfer units
Smaller, faster, more expensivedevice at level k caches asubset of the blocks from level k+
Level k:
Level k+1:
Datorarkitektur 2009
Request
Request
General Caching Concepts
Program needs object d, which is stored inProgram needs object d, which is stored in
some block b.some block b.
Cache hit^ Cache hit
Program finds b in the cache at level k.E.g., block 14.
Cache miss^ Cache miss
b is not at level k, so level k cache mustfetch it from level k+1.
E.g., block 12.
If level k cache is full, then some currentblock must be replaced (evicted). Whichone is the “victim”?
Placement policy:
where can the new block
go? E.g., b mod 4
Replacement policy:
which block should be
evicted? E.g., LRU
Level
k:
Level
k+1:
0
1
2
3
Request
F13 – 42 –
Datorarkitektur 2009
General Caching ConceptsTypes of cache misses: Types of cache misses:
Cold (compulsary) miss
Cold misses occur because the cache is empty.
Conflict miss
Most caches limit blocks at level k+1 to a small subset (sometimes asingleton) of the block positions at level k.
E.g. Block i at level k+1 must be placed in block (i mod 4) at level k.
Conflict misses occur when the level k cache is large enough, butmultiple data objects all map to the same level k block.
E.g. Referencing blocks 0, 8, 0, 8, 0, 8, ... would miss every time.
Capacity miss
Occurs when the set of active cache blocks (working set) is larger thanthe cache.
Datorarkitektur 2009
Examples of Caching in the Hierarchy
Hardware
On-Chip TLB
Addresstranslations
TLB
Webbrowser
Local disk
Web pages
Cache Type Registers L1 cache L2 cache Virtual Memory Buffer cache Network buffercacheBrowser cache Web cache
What Cached 4-byte word 32-byte block 32-byte block 4-KB page Parts of files Parts of files Web pages
Web proxyserver
Remote serverdisks
OS
Main memory
Hardware
On-Chip L
Hardware
Off-Chip L
AFS/NFSclient
Local disk
Hardware+OS
Main memory
Compiler
CPU registers
Managed By
Latency(cycles)
Where Cached