



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
These are the Lecture Slides of Parallel Computer Architecture which includes Conflict Resolution, Cache Miss, Write Serialization, In-Order Response, Multi-Level Caches, Dependence Graph etc.Key important points are: Directory Overhead, Cache Coherence, Protocol, Directory Overhead, Handling Read Miss, Handling Write Miss, Handling Writebacks
Typology: Slides
1 / 7
This page cannot be seen from the preview
Don't miss anything!




Implements the IEEE SCI directory protocol One node is an Intel Pentium Pro quad SMP The IQ-Link board connects to the system bus and implements the directory protocol Also contains a 32 MB 4-way set associative RAC Processors within a node are kept coherent via a MESI snoop-based protocol already implemented in Pentium Pro quad The SCI protocol keeps the RACs coherent across nodes The RAC maintains inclusion with the processor caches
Directory structure Home contains the id of the most recently queued sharer or the owner (6 bits) Sharing list A sharer contains the id of the next sharer and the previous sharer The last sharer contains the id of home node and previous sharer A circular doubly linked list Three major states in directory Home: remotely unowned, but may be in local quad Fresh: same as shared Gone: some node has exclusive ownership; memory stale Cache states Processor cache: MESI RAC: 29 stable states and many transient states 7 bits for representing RAC state Two-part naming of RAC state: first part says the location of the block in the list (ONLY, HEAD, TAIL, MID), second part mentions the actual state (modified, exclusive, fresh, copy, …) We will use some of these to understand the basics of SCI (full description available from IEEE standards) HEAD_DIRTY, TAIL_CLEAN, etc Three major operations on the list List construction: involves adding a new sharer to the list Rollout: remove a sharer from the list; must synchronize with immediate neighbors Purge/invalidate: head node always has write permission and so it can purge the entire list before writing; naturally, only the head node has the privilege of doing this Three classes of protocol Minimal SCI: sharing not allowed Typical SCI (will discuss this): all supports that a normal human being can imagine Full SCI: lot of optimizations including hardware support for synchronization
Note that directory remains in GONE state and memory is not updated (similar to an M to O transition) Handling races Suppose when the requester’s (say A) message reaches the old head (say B) the RAC line is in PENDING state SCI doesn’t have any pending state in directory or doesn’t use NACKs (actually uses, but small in number) B does become the new head (has to because the home has already updated the directory), but inherits the PENDING state from A Any subsequent request will come to B and will become the new pending head Ultimately the PENDING state is resolved along the chain starting from A upstream FIFO nature of the pending list guarantees fairness Also, no problem related to sizing the buffers for holding pending requests (no extra space needed
CASE A: requester is in HEAD_DIRTY state already Directory must be in GONE state Only need to invalidate sharers Requester sends an invalidation to the next sharer A sharer upon receiving an invalidation sends a roll-out request to its next sharer (unless TAIL); the receiving node sets its upstream pointer properly and sends a roll- out acknowledgment Eventually roll-out request is acknowledged, the sharer invalidates its RAC line and sends a reply back to head with the id of the next sharer Head moves on to purge the sharer with received id During the entire process requester’s RAC line remains in PENDING state Note that home is not at all involved here CASE B: requester is in ONLY_DIRTY state No transaction needed CASE C: requester is in HEAD_FRESH state Send state change request to home (FRESH to GONE) Once acknowledgment from home is received list purging can be started What if the home is in a state different from FRESH with a different head node? The only case in SCI when a NACK is generated The requester on receiving the NACK changes its state to PENDING and initiates a new write request to home for transitioning to ONLY_DIRTY CASE D: requester in MID_FRESH or TAIL_FRESH state First it must roll out from the list and attach itself to the head in HEAD_FRESH state (recall that only the head node can write) This roll-out may require acknowledgments from upstream and downstream neighbors (if MID) or just the upstream neighbor (if TAIL) Follow CASE C CASE E: requester not a sharer First get the block in HEAD_DIRTY state Follow CASE A
The biggest problem is that the MESI protocol is designed for in-order response (so what?) Had to use the deferred response signal for remote requests Lesson learned: for hierarchical protocols bus must be split-transaction with out- of-order response (what happens otherwise?) Snoop response is available after four cycles earliest Stall wire may be asserted by any processor unable to meet this four-cycle limit Bus controller samples the stall wire every two cycles RAC and directory (for local requests) are also looked up in parallel
NUMA-Q runs protocols in microcode The protocol processor is customized with bit-field operations and is a three-stage dual issue pipeline Has dedicated cache for holding recently accessed directory entries and RAC tags Protocol processor also contains three counters for monitoring performance These counters can be programmed through protocol code (i.e. read and written to)