



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An in-depth examination of memory systems in computer architecture, focusing on dram and sram technologies, their organization, timing parameters, and performance. It also covers the memory hierarchy and the impact of memory performance on overall processor performance.
Typology: Study notes
1 / 7
This page cannot be seen from the preview
Don't miss anything!




UCB
Adapted from D.A.Patterson,
UCB
Control Datapath
Memory
Processor
Input Output
Adapted from D.A.Patterson,
UCB
Year
Size
Cycle Time
64 Kb
250 ns
256 Kb
220 ns
1 Mb
190 ns
4 Mb
165 ns
16 Mb
145 ns
64 Mb
120 ns
1 Gb
35 ns
UCB
DRAM
CPU
Adapted from D.A.Patterson,
UCB
time of a full cache miss in instructions executed
1st Alpha (7000):
340 ns/5.0 ns = 68 clks x 2 or
136 instructions
2nd Alpha (8400):
266 ns/3.3 ns = 80 clks x 4 or
320 instructions
3rd Alpha (t.b.d.):
180 ns/1.7 ns =108 clks x 6 or
648 instructions
1/2X latency x 3X clock rate x 3X Instr/clock
Adapted from D.A.Patterson,
UCB
Clock Rate = 200 MHz (5 ns per cycle)
50% arith/logic, 30% ld/st, 20% control
DataMiss
(1.6)49%
Ideal CPI
(1.1)35%
Inst Miss
(0.5)16%
UCB
Hierarchy
Parallelism
Adapted from D.A.Patterson,
UCB
Control
Datapath
Memory
Processor
Memory
Memory
Memory
Memory
Fastest
Slowest
Smallest
Biggest
Highest
Lowest
Speed:
Size: Cost:
Adapted from D.A.Patterson,
UCB
Program access a relatively small portion of the address space atany instant of time.
Address Space
2^n - 1
Probabilityof reference
UCB
=> Keep most recently accessed data items closer to the processor
=> Move blocks consists of contiguous words to the upper levels
Lower Level
Memory
Upper Level
Memory
To Processor
From Processor
Blk X
Blk Y
Adapted from D.A.Patterson,
UCB
Hit Rate: the fraction of memory access found in the upper level
Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss
Miss Rate = 1 - (Hit Rate)
Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor
Lower Level
Memory
Upper Level
Memory
To Processor
From Processor
Blk X
Blk Y
Adapted from D.A.Patterson,
UCB
Present the user with as much memory as is available in thecheapest technology.
Provide access at the speed offered by the fastest technology.
Control
Datapath
Secondary
Storage
(Disk)
Processor
Registers
Main Memory(DRAM)
Second
Level Cache (SRAM)
On-Chip
Cache
1s
10,000,000s
(10s ms)
Speed (ns):
10s
100s
100s
Gs
Size (bytes):
Ks
Ms
Tertiary Storage (Disk)
10,000,000,000s
(10s sec)
Ts
UCB
A new control signal, output enable (OE_L) is needed
WE_L is asserted (Low), OE_L is disasserted (High)
D serves as the data input pin
WE_L is disasserted (High), OE_L is asserted (Low)
D is the data output pin
Both WE_L and OE_L are asserted:
Result is unknown. Don’t do that!!!
N
words
x M bitSRAM
N
M
Adapted from D.A.Patterson,
UCB
Write Timing:
Read Timing:
WriteHold Time
Write Setup Time
N
words
x M bitSRAM
N
M
Data In
Write Address
High Z
Read Address
Junk
Read Access
Time
Data Out
Read Access
Time
Data Out
Read Address
Adapted from D.A.Patterson,
UCB
bit = 1
bit = 0
Select = 1
Off On Off
On
On
On
UCB
1. Drive bit line
2.. Select row
1. Precharge bit line to Vdd
2.. Select row
3. Cell and bit line share charges
Very small voltage changes on the bit line
4. Sense (fancy sense amp)
Can detect changes of ~1 million electrons
5. Write: restore the value
1. Just do a dummy read to every cell.
row select
bit
Adapted from D.A.Patterson,
UCB
row decoder
rowaddress
Column Selector &
I/O Circuits
ColumnAddress
data RAM Cell
Array
word (row) select
bit (data) lines
Select 1 bit a time
Each intersection representsa 1-T DRAM Cell
Adapted from D.A.Patterson,
UCB
UCB
256K x 8
9
8
WE_L is asserted (Low), OE_L is disasserted (High)
D serves as the data input pin
WE_L is disasserted (High), OE_L is asserted (Low)
D is the data output pin
RAS_L goes low: Pins A are latched in as row address
CAS_L goes low: Pins A are latched in as column address
RAS/CAS edge-sensitive
Adapted from D.A.Patterson,
UCB
Quoted as the speed of a DRAM
A fast 4Mb DRAM t
RAC
= 60 ns
t
RC
= 110 ns for a 4Mbit DRAM with a t
RAC
of 60 ns
15 ns for a 4Mbit DRAM with a t
RAC
of 60 ns
35 ns for a 4Mbit DRAM with a t
RAC
of 60 ns
Adapted from D.A.Patterson,
UCB
perform a row access only every 110 ns (t
RC
perform column access (t
CAC
) in 15 ns, but time between column
accesses is at least 35 ns (t
PC
In practice, external address delays and turning aroundbuses make it 40 to 50 ns
Drive parallel DRAMs, external memory controller, bus to turnaround, SIMM module, pins…
180 ns to 250 ns latency from processor to memory is good for a“60 ns” (t
RAC
UCB
CPU, Cache, Bus, Memorysame width(32 bits)
CPU, Cache, Bus 1 word:Memory N Modules(4 Modules); example is word interleaved
CPU/Mux 1 word;Mux/Cache, Bus,Memory N words(Alpha: 64 bits & 256bits)
Adapted from D.A.Patterson,
UCB
- 2:1; why?
How frequent can you initiate an access?
Analogy: A little kid can only ask his father for money on Saturday
How quickly will you get what you want once you initiate an access?
Analogy: As soon as he asks, his father will give him the money
What happens if he runs out of money on Wednesday?
Time
Access Time
Cycle Time
Adapted from D.A.Patterson,
UCB
Access Pattern without Interleaving:
Start Access for D
Memory
Start Access for D
D1 available
Access Pattern with 4-way Interleaving:
Access Bank 0
Access Bank 1
Access Bank 2
Access Bank 3
We can Access Bank 0 again
Memory
Bank 1 Memory
Bank 0 Memory
Bank 3 Memory
Bank 2
UCB
2.5X cells/area, 1.5X die size in -3 years
DRAM only: density, leakage v. speed
SIMM or DIMM is replaceable unit=> computers use any generation DRAM
Little organization innovation in 20 years
page mode, EDO, Synch DRAM
RAMBUS: 10X BW, +30% cost => little impact
Adapted from D.A.Patterson,
UCB
Little organization innovation (vs. processors)in 20 years: page mode, EDO, Synch DRAM
Fewer DRAMs per computer over time
Growth bits/chip DRAM : 50%-60%/yr
Nathan Myrvold M/S: mature software growth(33%/yr for NT) - growth MB/$ of DRAM (25%-30%/yr)
Starting to question buying larger DRAMs?
Adapted from D.A.Patterson,
UCB
(Miillions)
UCB
Temporal Locality (Locality in Time): If an item is referenced, it willtend to be referenced again soon.
Spatial Locality (Locality in Space): If an item is referenced, itemswhose addresses are close by tend to be referenced soon.
Present the user with as much memory as is available in thecheapest technology.
Provide access at the speed offered by the fastest technology.
Good choice for presenting the user with a BIG memory system
Good choice for providing the user FAST access time.
Adapted from D.A.Patterson,
UCB
2 dies per package: Proc/I$/D$ + L2$