Instruction Issue Algorithms in Single-threaded Execution, Slides of Computer Science

A set of lecture notes that covers the topic of instruction issue algorithms in single-threaded execution, including in-order and out-of-order issue, war hazard, register renaming, and pipeline. It also discusses the limitations of ilp (instruction level parallelism) and cycle time reduction techniques.

Typology: Slides

2012/2013

Uploaded on 03/28/2013

ekana
ekana 🇮🇳

4

(44)

370 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Objectives_template
file:///E|/parallel_com_arch/lecture6/6_1.htm[6/13/2012 11:14:27 AM]
Module 3: "Recap: Single-threaded Execution"
Lecture 6: "Instruction Issue Algorithms"
The Lecture Contains:
Instruction selection
In-order multi-issue
Out-of-order issue
WAR hazard
Modified bypass
WAR and WAW
Register renaming
The pipeline
What limits ILP now?
Cycle time reduction
Alternative: VLIW
Current research in µP
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Instruction Issue Algorithms in Single-threaded Execution and more Slides Computer Science in PDF only on Docsity!

Module 3: "Recap: Single-threaded Execution"

Lecture 6: "Instruction Issue Algorithms"

The Lecture Contains:

Instruction selection

In-order multi-issue

Out-of-order issue

WAR hazard

Modified bypass

WAR and WAW

Register renaming

The pipeline

What limits ILP now?

Cycle time reduction

Alternative: VLIW

Current research in μP

Module 3: "Recap: Single-threaded Execution"

Lecture 6: "Instruction Issue Algorithms"

Instruction selection

Simplest possible design Issue the instructions sequentially (in-order) Scan the issue queue, stop as soon as you come to an instruction dependent on one already issued

Cannot issue the last two even though they are independent of the first two: in-order completion is a must for precise exception support

In-order multi-issue

Complexity of selection logic Need to check for RAW and WAW Comparisons for RAW: N(N-1) where N is the issue width Comparisons for WAW: N(N-1)/ 18 comparators for 4-issue Still need to make sure instructions write back in-order to support precise exception As instructions issue, they are removed from the issue queue and put in a re-order buffer (also called active list in MIPS processors) [Isn’t WAW check sufficient?] Instructions write back or retire in-order from re-order buffer (ROB)

Out-of-order issue

Taking the parallelism to a new dimension Central to all modern microprocessors Scan the issue queue completely, select independent instructions and issue as many as possible limited only by the number of functional units Need more comparators Able to extract more ILP: CPI goes down further Possible to overlap the latency of mult/div, load/store with execution of other independent instructions

Module 3: "Recap: Single-threaded Execution"

Lecture 6: "Instruction Issue Algorithms"

WAR hazard

Modified bypass

An executing instruction must broadcast results to the issue queue Waiting instructions compare their source register numbers with the destination register number of the bypassed value Also, now it needs to make sure that it is consuming the right value in program order to avoid WAR

Need to tag every instruction with its last producer Can we simplify this?

WAR and WAW

These are really false dependencies Arises due to register allocation by the compiler Thus far we have assumed that ROB has space to hold the destination values: needs wide ROB entries These values are written back to the register file when the instructions retire or commit in- order from ROB Also, bypass becomes complicated Better way to solve it: rename the destination registers

More physical registers more in-flight instructions possibility of more parallelism But cannot make the register file very big Takes time to access Burns power

Module 3: "Recap: Single-threaded Execution"

Lecture 6: "Instruction Issue Algorithms"

The pipeline

Fetch, decode, rename, issue, register file read, ALU, cache, retire Fetch, decode, rename are in-order stages, each handles multiple instructions every cycle The ROB entry is allocated in rename stage Issue, register file, ALU, cache are out-of-order Retire is again in-order, but multiple instructions may retire each cycle: need to free the resources and drain the pipeline quickly

What limits ILP now?

Instruction cache miss (normally not a big issue) Branch misprediction Observe that you predict a branch in decode, and the branch executes in ALU There are four pipeline stages before you know outcome Misprediction amounts to loss of at least 4F instructions where F is the fetch width Data cache miss Assuming a issue width of 4, frequency of 3 GHz, memory latency of 120 ns, you need to find 1440 independent instructions to issue so that you can hide the memory latency: this is impossible (resource shortage)

Cycle time reduction

Execution time = CPI × instruction count × cycle time Talked about CPI reduction or improvement in IPC (instructions retired per cycle) Cycle time reduction is another technique to boost performance Faster clock frequency Pipelining poses a problem Each pipeline stage should be one cycle for balanced progress Smaller cycle time means need to break pipe stages into smaller stages Superpipelining Faster clock frequency necessarily means deep pipes Each pipe stage contains small amount of logic so that it fits in small cycle time May severely degrade CPI if not careful Now branch penalty is even bigger (31 cycles for Intel Prescott): branch mispredictions cause massive loss in performance (93 micro-ops are lost, F=3) Long pipes also put more pressure on resources such as ROB and registers because instruction latency increases (in terms of cycles, not in absolute terms) Instructions occupy ROB entries and registers longer The design becomes increasingly complicated (long wires)