

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of caches and memory-level parallelism as taught in the fall 2008 ece 587/687 course at portland state university. Topics include cpu execution time, cache performance, cache performance metrics, memory hierarchy, cache structure, associativity, cache misses, and cache organization. The document also mentions the use of miss status holding (handling) registers (mshrs) and provides examples and references.
Typology: Study notes
1 / 3
This page cannot be seen from the preview
Don't miss anything!


ยฉ Copyright by Alaa Alameldeen and Haitham Akkary 2008
Portland State University ECE 587/
Portland State University โ ECE 587/687 โ Fall 2008 2
CPI = CPI(Perfect Memory)
Very long latency (discuss)
Portland State University โ ECE 587/687 โ Fall 2008 3
Memory stall cycles Per Instruction = Cache Misses per instruction x miss penalty Processor Performance:
Average memory access time =
Cache hierarchies attempt to reduce average memory access time
Portland State University โ ECE 587/687 โ Fall 2008 4
Miss rate = miss ratio x memory accesses per inst
Depends on cache design parameters Bigger caches, larger associativity, more ports increase hit time
Portland State University โ ECE 587/687 โ Fall 2008 5
Levels in memory hierarchy:
Processor
L2 Cache
Main Memory
Disk
Portland State University โ ECE 587/687 โ Fall 2008 6
Array of blocks (lines)
Finding a block in cache:
Offset: byte offset in block Index: Which set in the cache is the block located Tag: Need to match address tag in cache
Data Tag Index Offset Address
ECE 587/687 โ Fall 2008
Alaa R. Alameldeen
Portland State University โ ECE 587/687 โ Fall 2008 7
Associativity
Set associativity Set: Group of blocks corresponding to same index Each block in the set is called a Way 2-way set associative cache: each set contains two blocks Direct-mapped cache: each set contains one block Fully-associative cache: the whole cache is one set Need to check all tags in a set to determine hit/miss status
Portland State University โ ECE 587/687 โ Fall 2008 8
Example: Cache Block Placement
Consider a 4-way, 32KB cache with 64-byte lines Where is 48-bit address 0x0000FFFFAB64?
Portland State University โ ECE 587/687 โ Fall 2008 9
Types of Cache Misses
Compulsory (cold) misses: First access to a block Prefetching can reduce these misses Capacity misses: A cache cannot contain all blocks needed in a program, some blocks are discarded then accessed Replacement policies should target blocks that wonโt be used later Conflict misses: Blocks mapping to the same set may be discarded (in direct-mapped and set-associative caches) Increasing associativity can reduce these misses For multiprocessors, coherency misses can also happen
Portland State University โ ECE 587/687 โ Fall 2008 10
Non-Blocking Cache Hierarchy
Superscalar processors require parallel execution units
Cache hierarchies capable of simultaneously servicing multiple memory requests
Portland State University โ ECE 587/687 โ Fall 2008 11
Miss Status Holding (Handling)
Registers
MSHRs facilitate non-blocking memory level parallelism Used to track address, data, and status for multiple outstanding cache misses Need to provide correct memory ordering, respond to CPU requests, and maintain cache coherence Design details vary widely between different processors
Portland State University โ ECE 587/687 โ Fall 2008 12
Cache & MSHR Organization
Paper Fig 1: block diagram of cache organization Main Components: MSHR: One register for each miss to be handled concurrently N-way comparator: Compares an address to all block addresses in MSHRs (N = #MSHRs) Input Stack: Buffer space for all misses corresponding to MSHR entries Size = #MSHRs x block size Status update and collecting networks Current implementations combine the MSHR and input stack