






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The limitations of superscalar processors and the need for thread-level parallelism (tlp) to increase throughput. Multithreading alternatives, including fine-grain multithreading and simultaneous multithreading (smt), and their respective advantages and disadvantages. Smt addresses both horizontal and vertical waste by allowing any thread to issue instructions during each cycle. The document also covers smt models, performance, side effects, and compares smt to multiprocessors.
Typology: Study notes
1 / 12
This page cannot be seen from the preview
Don't miss anything!







© Copyright by Alaa Alameldeen and Haitham Akkary 2009
Portland State University ECE 587/
Many control, data and functional dependences
Vertical Waste: All issue slots in a cycle are notused Horizontal waste: Some issue slots in a cycle arenot used Paper Figure 1
During each cycle, a single thread is allowed toissue instructions Removes vertical waste Still limited by ILP available within each thread
During each cycle, any thread can issueinstructions (instructions from different threads canbe issued at the same time) Addresses both horizontal and vertical waste
Superscalar Processors: Where Have Cycles Gone?
Issue slots are utilized only 19% of the time Lots of causes for issue stall cycles Need aggressive latency-hiding techniques
Paper Table 3
Simultaneous Multithreading Models (Cont.)
Each thread is connected to exactly one of eachtype of functional unit Limits scheduling choices for functional units toreduce hardware complexity
Four issue or full SMT with 3-4 threads Dual issue SMT with 4 threads Limited Connection SMT with 5 threads Single issue SMT with 6 threads
Area efficiency Reducing number of threads (i.e., threads becomingidle) allows other threads to progress faster in SMTprocessors, no change in MP Granularity and flexibility of design: Unit of design is awhole processor for MP, more flexible in SMT Disadvantages? (discuss)
Hardware complexity Scheduling hardware requirements increase with threads Register file size increase May need more ports Pipeline depth Bigger structures (e.g., register file) require longer accesstime Leads to increasing the number of pipeline stages Issue policy Fixed thread priority Round-Robin priority ICOUNT Others?