Overcoming Waste in Superscalar Processors: Simultaneous Multithreading - Prof. Alaa R. Al, Study notes of Computer Architecture and Organization

The limitations of superscalar processors and the need for thread-level parallelism (tlp) to increase throughput. Multithreading alternatives, including fine-grain multithreading and simultaneous multithreading (smt), and their respective advantages and disadvantages. Smt addresses both horizontal and vertical waste by allowing any thread to issue instructions during each cycle. The document also covers smt models, performance, side effects, and compares smt to multiprocessors.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-pvm
koofers-user-pvm 🇺🇸

9 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
© Copyright by Alaa Alameldeen and Haitham Akkary 2009
Simultaneous
Multithreading (SMT)
Portland State University
ECE 587/687
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Overcoming Waste in Superscalar Processors: Simultaneous Multithreading - Prof. Alaa R. Al and more Study notes Computer Architecture and Organization in PDF only on Docsity!

© Copyright by Alaa Alameldeen and Haitham Akkary 2009

Simultaneous

Multithreading (SMT)

Portland State University ECE 587/

Motivation



ILP limitations of superscalar processors

 Many control, data and functional dependences 

Wide superscalar pipelines cannot use all issueslots

 Vertical Waste: All issue slots in a cycle are notused  Horizontal waste: Some issue slots in a cycle arenot used  Paper Figure 1 

To increase throughput, we need to use thread-level parallelism (TLP)

Multithreading Alternatives



Fine-grain multithreading

 During each cycle, a single thread is allowed toissue instructions  Removes vertical waste  Still limited by ILP available within each thread 

Simultaneous Multithreading

 During each cycle, any thread can issueinstructions (instructions from different threads canbe issued at the same time)  Addresses both horizontal and vertical waste

Superscalar Processors: Where Have Cycles Gone? 

Discuss Paper Figure 2

 Issue slots are utilized only 19% of the time  Lots of causes for issue stall cycles  Need aggressive latency-hiding techniques 

Multiple causes for stalls can be addressed usinglatency-hiding techniques

 Paper Table 3

Simultaneous Multithreading Models (Cont.) 

SM: Limited Connection

 Each thread is connected to exactly one of eachtype of functional unit  Limits scheduling choices for functional units toreduce hardware complexity 

Hardware Complexity: Paper Table 4

SMT Performance



Paper Figure 3



Fine-grain MT can only increase throughput by afactor of 2.



SMT has much higher speedup



Alternatives to execute 4 instructions per cycle

 Four issue or full SMT with 3-4 threads  Dual issue SMT with 4 threads  Limited Connection SMT with 5 threads  Single issue SMT with 6 threads

SMT vs. Multiprocessors



Paper Figure 5



SMT outperforms multiprocessing for all scenarioscompared



Advantages of SMT vs. MP

 Area efficiency  Reducing number of threads (i.e., threads becomingidle) allows other threads to progress faster in SMTprocessors, no change in MP  Granularity and flexibility of design: Unit of design is awhole processor for MP, more flexible in SMT  Disadvantages? (discuss)

SMT Design Issues

 Hardware complexity  Scheduling hardware requirements increase with threads  Register file size increase  May need more ports  Pipeline depth  Bigger structures (e.g., register file) require longer accesstime  Leads to increasing the number of pipeline stages  Issue policy  Fixed thread priority  Round-Robin priority  ICOUNT  Others?