

























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Double Date Rate (DDR) SDRAM. One channel = 8 bytes = 64 bits wide. Current DDR3 SDRAM Example: PC3-12800 (DDR3-1600). 200 MHz (internal ...
Typology: Summaries
1 / 33
This page cannot be seen from the preview
Don't miss anything!


























lec # 10
Fall 2017
11-14-
Mainstream Computer System ComponentsMainstream Computer System Components
Double Date Rate (DDR) SDRAMOne channel = 8 bytes = 64 bits wideCurrent DDR3 SDRAM Example:PC3-12800 (DDR3-1600)200 MHz (internal base chip clock)8-way interleaved (8-banks)~12.8 GBYTES/SEC (peak)(one 64bit channel)~25.6 GBYTES/SEC (peak)(two 64bit channels –
e,g AMD x4, x
)
~38.4 GBYTES/SEC (peak)(three 64bit channels –
e.g Intel Core i
)
PC2-6400 (DDR2-800)200 MHz (internal base chip clock)64-128 bits wide4-way interleaved (4-banks)~6.4 GBYTES/SEC (peak)(one 64bit channel)~12.8 GBYTES/SEC (peak)(two 64bit channels)DDR SDRAM Example:PC3200 (DDR-400)200 MHz (base chip clock)4-way interleaved (4-banks)~3.2 GBYTES/SEC (peak)(one 64bit channel)~6.4 GBYTES/SEC(two 64bit channels)Single Date Rate SDRAMPC100/PC133100-133MHz (base chip clock)64-128 bits wide2-way inteleaved (2-banks)~ 900 MBYTES/SEC peak (64bit)
Example: PCI, 33-66MHz
32-64 bits wide
133-528 MBYTES/SECPCI-X 133MHz 64 bit1024 MBYTES/SEC
CPU Core 2 GHz - 3.5 GHz 4-way Superscaler (RISC or RISC-core (x86):
Dynamic scheduling, Hardware speculationMultiple FP, integer FUs, Dynamic branch prediction …
Memory Bus
All
Non-blocking caches
2-8 way set associative (usually separate/split)
8-16 way set associative (unified)
16-64 way set associative (unified) Examples: AMD K8: HyperTransport
Alpha, AMD K7: EV6, 200-400 MHzIntel PII, PIII: GTL+
133 MHz
Intel P
800 MHz
NorthBridge
SouthBridge
Off or On-chip
System Bus = CPU-Memory Bus = Front Side Bus (FSB)
(FSB
)
I/O Subsystem: 4
th
Edition in Chapter 6
rd
Edition in Chapter 7)
System Memory(DRAM)
SRAM
One core or multi-core (2-8) per chip
AKA System Core Logic
SRAM
lec # 10
Fall 2017
11-14-
The Memory HierarchyThe Memory Hierarchy
-^ •
Reduce Miss Rate
Reduce Cache Miss Penalty
Reduce Cache Hit Time
Benefits, Issues/Strategies
Basic Virtual
→
Physical Address Translation: Page Tables
Speeding Up Address Translation: Translation Lookaside Buffer (TLB)
Cache exploits access locality to: •
Lower AMAT by hiding longmain memory access latency.
Lower demands on main memorybandwidth.
th
rd
i.e Memory latency reduction
1 2
lec # 10
Fall 2017
11-14-
Or maximum effective memory bandwidth
DRAM: Slow but high density
SRAM: Fast but low density
th
Edition: Chapter 5.
rd
Edition: Chapter 5.8, 5.
lec # 10
Fall 2017
11-14-
Logical Dynamic RAM (DRAM) Chip OrganizationLogical Dynamic RAM (DRAM) Chip Organization
(16 Mbit)(16 Mbit)
Data In Data Out
Column Decoder^ Column Decoder
Sense Amps & I/OSense
Amps & I/O
MemoryMemory
ArrayArray
(16,384 x 16,384)(16,384 x 16,384)
A0…A
…A
A 0
…
14
DD Q Q
Word LineW
ord Line
Storage
CellCell
Row DecoderRow Decoder
Row/ColumnAddress
Control Signals:1 - Row Access Strobe (RAS): Low to latch row address2- Column Address Strobe (CAS): Low to latch column address3-
Write Enable (WE) orOutput Enable (OE)
4- Wait for data to be ready
SharedPins
A periodic data refresh is requiredby reading every bit
Basic Steps:
1 - Supply Row Address
2- Supply Column Address
3- Get Data
1 - Supply Row Address2- Supply Column Address3- Read/Write Data
1 2
3
Now: 16 Gbit/chip
lec # 10
Fall 2017
11-14-
Simplified Asynchronous DRAM Read TimingSimplified Asynchronous DRAM Read Timing
Source:
http://arstechnica.com/paedia/r/ram_guide/ram_guide.part2-1.html
tRC
(memory cycle time)
tPC
t
RAC
:
Minimum time from RAS (Row Access Strobe) line falling to the valid data output.
t
RC
: Minimum time from the start of one row access to the start of the next (memory cycle time).
t
CAC
: minimum time from CAS (Column Access Strobe) line falling to valid data output.
t
PC
: minimum time from the start of one column access to the start of the next.
Memory Cycle Time = tRC = tRAC + RAS Precharge Time
Memory Cycle time = 200 ns
Peak Memory Bandwidth = 8 / 200 x 10
= 40 x 10
6
Bytes/sec
(memory access time)
1
1
2
2
3
3
4
4
Non-burst Mode Memory Access Example
Recovery Time
lec # 10
Fall 2017
11-14-
Simplified DRAM Speed Parameters
Row Access Strobe (RAS)Time: (similar to t
RAC
):
Minimum time from RAS (Row Access Strobe) line falling(activated) to the first valid data output.
A major component of memory latency.
Only improves ~ 5% every year.
Column Access Strobe (CAS) Time/data transfer time:(similar to
t
CAC
)
The minimum time required to read additional data by changingcolumn address while keeping the same row address.
Along with memory bus width, determines peak memory bandwidth.
CAC
CAC
6
6
And cache miss penalty M
1/2CAS
1st 8 bytes
2nd 8 bytes
3rd 8 bytes
4th 8 bytes
40 ns
50 ns
60 ns
70 ns
80 ns
Memory Latency
1/2CAS
1/2CAS
1/2CAS
Simplified SDRAM Burst-Mode Access Timing
Example
For PC100 SDRAM: Clock = 100 MHz
Burst length shown = 4
Effective
Burst-Mode Access
lec # 10
Fall 2017
11-14-
Page Mode DRAMPage Mode DRAM
Asynchronous DRAM:^ Asynchronous DRAM:
Memory Cycle Time
1 - Supply Row Address
2- Supply Column Address
3- Get Data
Non-burst Mode Memory Access
Last system memory type to use non-burst access mode
1 - Supply Row Address2- Supply Column Address3- Read/Write Data
lec # 10
Fall 2017
11-14-
Fast Page Mode (FPM) DRAMFast Page Mode (FPM) DRAM
(memory access time)
(constantfor entireburst access)
(Change)
Burst Mode Memory Access
Constant
Change
lec # 10
Fall 2017
11-14-
Simplified Asynchronous Extended Data Out (EDO)Simplified Asynchronous Extended Data Out (EDO)
DRAM Read TimingDRAM Read Timing
Source:
http://arstechnica.com/paedia/r/ram_guide/ram_guide.part2-1.html
Typical timing at 66 MHz :
5-2-2-
(
For bus width = 64 bits = 8 bytes
Max. Bandwidth =
8 x 66 / 2 = 264 Mbytes/sec
It takes = 5+2+2+2 = 11 memory cycles or
15 ns x 11 = 165 ns to read 32 byte cache block
Minimum Read Miss penalty for CPU running at 1 GHz = M =
11 x 15 = 165
CPU cycles
(early 90s)
(memory access time)
One memory cycle at 66 MHz = 1000/66 = 15 CPU cycles at 1 GHz
lec # 10
Fall 2017
11-14-
Basic Memory Bandwidth Improvement/Miss Penalty (M) LatencyBasic Memory Bandwidth Improvement/
Reduction Techniques
Techniques
Wider Main Memory (CPU-Memory Bus/Interface):Memory bus width is increased to a number of words (“usually” up to the size of a cacheblock).
e.g Doubling the width of cache and memory doubles potential memory bandwidthavailable to the CPU.
Interleaved (Multi-Bank) Memory:Memory is organized as a number of independent banks.
Burst Mode Memory Access
e.g 128 bit (16 bytes) memory bus instead of 64 bits (8 bytes) – now 24 bytes (192 bits)
wider FSB?
The above three techniques can also be applied to any cache level toreduce cache hit time and increase cache bandwidth.
lec # 10
Fall 2017
11-14-
Four Way (Four Banks) Interleaved Memory
0 4 8 121620 ..
1 5 9 131721 ..
2 6 10141822 ..
3 7 11151923 ..
Bank 0
Bank 1
Bank 2
Bank 3
Sequential Mapping ofMemory AddressesTo Memory Banks
Example
CacheBlock? NextCacheBlock?
lec # 10
Fall 2017
11-14-
Memory Bank InterleavingMemory Bank Interleaving
Number of banks
≥
Number of cycles to access word in a bank
(One Memory Bank)
(4 banks similar to the organizationof DDR SDRAM memory chips)
Memory Bank Cycle Time Memory Bank Cycle Time
Can be applied at
:
Bank interleavingBank interleaving
does not reduce latency of accesses to the same bank
Very long memory bank recovery time shown here
Also DDR2 (DDR3 increases the number to 8 banks,while DDR4 increases it to 16 banks)
(Multi-Banked Memory)
One Memory
Bank
Four Memory
Banks
lec # 10
Fall 2017
11-14-
Synchronous^ SynchronousDynamic RAM,Dynamic RAM,(SDR SDRAM)(SDR SDRAM)OrganizationOrganizationDDR SDRAM organization is similar but
(mid 90s)
(late 90s - 2006)
SDRAM speed is rated at max.clock speed supported:100MHZ = PC100133MHZ = PC
SDR SDRAM Peak Memory Bandwidth =
= Bus Width /(0.5 x t
CAC
= Bus Width x Clock rate
DDR SDRAM Peak Memory Bandwidth =
= Bus Width /(0.25 x t
CAC
= Bus Width x Clock rate x 2
(Data Lines)
AddressLines
TimingComparison
SDR = Single Data Rate
DDR = Double Data Rate
Also DDR
SDR = Single Data Rate DDR = Double Data Rate
(DDR3 increases the number of banks to 8 banks)
lec # 10
Fall 2017
11-14-
Comparison of Synchronous Dynamic RAM SDRAM
Synchronous Dynamic RAM SDRAM Generations:
DDR2 Vs. DDR and SDR SDRAM
Figure Source: http://www.elpida.com/pdfs/E0678E10.pdf
Single Data Rate (SDR) SDRAM transfersdata on every rising edge of the clock.Whereas both DDR and DDR2 are doublepumped; they transfer data on the risingand falling edges of the clock.DDR2 vs. DDR: •
DDR2 doubles bus frequency for the same physical DRAM chip clock rate (asshown), thus doubling the effective datarate another time. •
Ability for much higher clock speeds than DDR, due to design improvements(still 4-banks per chip):
DDR2's bus frequency is boosted by electrical interfaceimprovements, on-die termination,prefetch buffers and off-chipdrivers.
However, latency vs. DDR is greatly increased as a trade-off.
Shown: DDR2-533 (PC2-4200)~ 4.2 GB/s peak bandwidthShown: DDR-266 (PC-2100)~ 2.1 GB/s peak bandwidth
Shown: PC133~ 1.05 GB/s peak bandwidth
Peak bandwidth given for a single 64bit memory channel (i.e 8-byte memory bus width)
4 Banks 4 Banks 2 Banks
4258 MB/s= 8 x 133 x 4 2128 MB/s= 8 x 133 x 2 1064 MB/s= 8 x 133
Internal Base Frequency = 133 MHz For DDR3: The trend continues with another external frequency doubling