


















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An in-depth exploration of memory hierarchy and cache systems, covering topics such as memory organization, cache associativity, placement and identification, cache performance, and prefetching. Learn about the different types of caches, their organization, and their role in improving system performance.
Typology: Study notes
1 / 26
This page cannot be seen from the preview
Don't miss anything!



















Cache Organization
Cache Associativity
index key idx^ key
tag data tag^ data
decoder decoder
“Indexed Memory” “Direct Mapped”
i-bit index 2 i^ blocks
“Associative Memory” “Fully Associative” “CAM” no index unlimited blocks
“N-Way Set-Associative” i-bit index 2 i^ • N blocks
Cache performance
A Typical Memory Hierarchy
L1 Data Cache
Instruction Cache Unified L Cache RF (^) Memory
Memory
Memory
Memory
Multiported register file (part of CPU)
Split instruction & data primary caches (on-chip SRAM)
Multiple interleaved memory banks (off-chip DRAM)
Large unified secondary cache (on-chip SRAM)
Presence of L2 influences L1 design
Inclusion Policy
Victim Caches (Jouppi 1990)
Unified L Cache RF
Evicted data from L
Evicted data From VC
where? Hit data from VC (miss in L1)
Victim cache is a small associative back up cache, added to a direct mapped cache, which holds recently evicted lines
Victim Cache Fully Assoc. 4 blocks
L1 Data Cache Direct Map.
Way Predicting Caches
(MIPS R10000 off-chip L2 cache)
Use processor address to index into way prediction table Look in predicted way at given index, then:
Return copy of data from cache
Look in other way
Read block of data from next level of cache
(change entry in prediction table)
Prefetching
Speculate on future instruction and data accesses
and fetch them into cache(s)
Varieties of prefetching
What types of misses does prefetching affect?
Issues in Prefetching
L1 Data
Instruction Unified L Cache RF
Prefetched data
“What” is Computer Architecture?
Instr. Set Proc. I/O system
Compiler
Operating System
Application
Digital Design Circuit Design
Instruction Set Architecture
Firmware
Coordination of many levels of abstraction Under a rapidly changing set of forces Design, Measurement, and Evaluation
Datapath & Control
Layout
Memory Management
Absolute Addresses
Addresses in a program depended upon where the program
But it was more convenient for programmers to write
Linker and/or loader:
Dynamic Address Translation
In the early machines, I/O operations were slow and each word transferred involved the CPU Higher throughput if CPU and I/O of 2 or more programs were overlapped. How?⇒ multiprogramming
Programming and storage management ease ⇒ need for a base register
Independent programs should not affect each other inadvertently ⇒ need for a bound register
prog
prog
Memory Fragmentation
OS Space 16K 24K 24K
32K
24K
user 1
user 2
user 3
OS Space 16K 24K 16K
32K
24K
user 1 user 2
user 3
user 5
user 4 8K
Users 4 & 5 arrive
Users 2 & 5 leave OS Space 16K 24K 16K
32K
24K
user 1
user 4 8K user 3
free
Processor generated address can be interpreted as a pair <page number, offset>
A page table contains the physical address of the base of each page
Paged Memory Systems
0 1 2 3
0 1 2 3 Address Space of User-
Page Table of User-
1 0
2
3
page number offset
Private Address Space per User
User 1 VA
Page Table
User 2 VA
Page Table
User 3^ VA
Page Table
PhysicalMemory
free
OS pages
Where Should Page Tables Reside?
⇒ Space requirement is large ⇒ Too expensive to keep in registers
Manual Overlays
Ferranti Mercury 1956
40k bits main
640k bits drum Central Store
Assume an instruction can address all the storage on the drum
Method 1: programmer keeps track of addresses in the main memory and initiates an I/O transfer when required
Method 2: automatic initiation of I/O transfers by software address translation Brooker ’ s interpretive coding, 1960 Method1: Difficult, error prone Method2: Inefficient
Not just an ancient black art, e.g., IBM Cell microprocessor explicitly managed local store has same issues
Demand Paging in Atlas (1962)
Secondary (Drum) 32x6 pages
Primary 32 Pages 512 words/page
User sees 32 x 6 x 512 words of storage
Primary memory as a cache for secondary memory
Hardware Organization of Atlas
Initial Address Decode
16 ROM pages 0.4 ~1 μsec 2 subsidiary pages 1.4 μsec
Main 32 pages 1.4 μsec
Drum (4) 192 pages (^8) 88 sec/word^ Tape decks
48-bit words 512-word pages
1 Page Address Register (PAR) per page frame
Compare the effective page address against all 32 PARs match ⇒ normal access no match ⇒ page fault save the state of the partially executed instruction
Effective Address
system code (not swapped) system data (not swapped) 0
31
PARs
<effective PN , status>
Atlas Demand Paging Scheme
On a page fault:
Linear Page Table
VPN Offset Virtual address
PT Base Register
VPN
Data word
Data Pages
Offset
PPN
PPN
DPN PPN
PPN
PPN
Page Table
DPN
PPN
DPN
DPN
DPN PPN
Size of Linear Page Table
Hierarchical Page Table
Level 1 Page Table
Level 2 Page Tables
Data Pages
page in primary memory page in secondary memory
Root of the Current Page Table p
offset
p
Virtual Address
(Processor Register)
PTE of a nonexistent page
p1 p2 offset
31 2221 1211 0
10-bit L1 index
10-bit L2 index
Address Translation & Protection
Physical Address
Virtual Address
Address Translation
Virtual Page No. (VPN) offset
Physical Page No. (PPN) offset
Protection Check
Exception?
Kernel/User Mode
Read/Write