Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Instruction Issue Algorithms in Single-threaded Execution, Slides of Computer Science

All India Institute of Medical Sciences Computer Science

A set of lecture notes that covers the topic of instruction issue algorithms in single-threaded execution, including in-order and out-of-order issue, war hazard, register renaming, and pipeline. It also discusses the limitations of ilp (instruction level parallelism) and cycle time reduction techniques.

Typology: Slides

2012/2013

Uploaded on 03/28/2013

ekana 🇮🇳

(44)

370 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Objectives_template

file:///E|/parallel_com_arch/lecture6/6_1.htm[6/13/2012 11:14:27 AM]

Module 3: "Recap: Single-threaded Execution"

Lecture 6: "Instruction Issue Algorithms"

The Lecture Contains:

Instruction selection

In-order multi-issue

Out-of-order issue

WAR hazard

Modified bypass

WAR and WAW

The pipeline

What limits ILP now?

Cycle time reduction

Alternative: VLIW

Current research in µP

Discover Slides of Computer Science All India Institute of Medical Sciences

Partial preview of the text

Download Instruction Issue Algorithms in Single-threaded Execution and more Slides Computer Science in PDF only on Docsity!

Module 3: "Recap: Single-threaded Execution"

Lecture 6: "Instruction Issue Algorithms"

The Lecture Contains:

Instruction selection

In-order multi-issue

Out-of-order issue

WAR hazard

Modified bypass

WAR and WAW

Register renaming

The pipeline

What limits ILP now?

Cycle time reduction

Alternative: VLIW

Current research in μP

Module 3: "Recap: Single-threaded Execution"

Lecture 6: "Instruction Issue Algorithms"

Instruction selection

Simplest possible design Issue the instructions sequentially (in-order) Scan the issue queue, stop as soon as you come to an instruction dependent on one already issued

Cannot issue the last two even though they are independent of the first two: in-order completion is a must for precise exception support

In-order multi-issue

Complexity of selection logic Need to check for RAW and WAW Comparisons for RAW: N(N-1) where N is the issue width Comparisons for WAW: N(N-1)/ 18 comparators for 4-issue Still need to make sure instructions write back in-order to support precise exception As instructions issue, they are removed from the issue queue and put in a re-order buffer (also called active list in MIPS processors) [Isn’t WAW check sufficient?] Instructions write back or retire in-order from re-order buffer (ROB)

Out-of-order issue

Taking the parallelism to a new dimension Central to all modern microprocessors Scan the issue queue completely, select independent instructions and issue as many as possible limited only by the number of functional units Need more comparators Able to extract more ILP: CPI goes down further Possible to overlap the latency of mult/div, load/store with execution of other independent instructions

Module 3: "Recap: Single-threaded Execution"

Lecture 6: "Instruction Issue Algorithms"

WAR hazard

Modified bypass

An executing instruction must broadcast results to the issue queue Waiting instructions compare their source register numbers with the destination register number of the bypassed value Also, now it needs to make sure that it is consuming the right value in program order to avoid WAR

Need to tag every instruction with its last producer Can we simplify this?

WAR and WAW

These are really false dependencies Arises due to register allocation by the compiler Thus far we have assumed that ROB has space to hold the destination values: needs wide ROB entries These values are written back to the register file when the instructions retire or commit in- order from ROB Also, bypass becomes complicated Better way to solve it: rename the destination registers

More physical registers more in-flight instructions possibility of more parallelism But cannot make the register file very big Takes time to access Burns power

Module 3: "Recap: Single-threaded Execution"

Lecture 6: "Instruction Issue Algorithms"

The pipeline

Fetch, decode, rename, issue, register file read, ALU, cache, retire Fetch, decode, rename are in-order stages, each handles multiple instructions every cycle The ROB entry is allocated in rename stage Issue, register file, ALU, cache are out-of-order Retire is again in-order, but multiple instructions may retire each cycle: need to free the resources and drain the pipeline quickly

What limits ILP now?

Instruction cache miss (normally not a big issue) Branch misprediction Observe that you predict a branch in decode, and the branch executes in ALU There are four pipeline stages before you know outcome Misprediction amounts to loss of at least 4F instructions where F is the fetch width Data cache miss Assuming a issue width of 4, frequency of 3 GHz, memory latency of 120 ns, you need to find 1440 independent instructions to issue so that you can hide the memory latency: this is impossible (resource shortage)

Cycle time reduction

Execution time = CPI × instruction count × cycle time Talked about CPI reduction or improvement in IPC (instructions retired per cycle) Cycle time reduction is another technique to boost performance Faster clock frequency Pipelining poses a problem Each pipeline stage should be one cycle for balanced progress Smaller cycle time means need to break pipe stages into smaller stages Superpipelining Faster clock frequency necessarily means deep pipes Each pipe stage contains small amount of logic so that it fits in small cycle time May severely degrade CPI if not careful Now branch penalty is even bigger (31 cycles for Intel Prescott): branch mispredictions cause massive loss in performance (93 micro-ops are lost, F=3) Long pipes also put more pressure on resources such as ROB and registers because instruction latency increases (in terms of cycles, not in absolute terms) Instructions occupy ROB entries and registers longer The design becomes increasingly complicated (long wires)

Instruction Issue Algorithms in Single-threaded Execution, Slides of Computer Science

Related documents

Partial preview of the text

Download Instruction Issue Algorithms in Single-threaded Execution and more Slides Computer Science in PDF only on Docsity!

Module 3: "Recap: Single-threaded Execution"

Lecture 6: "Instruction Issue Algorithms"

The Lecture Contains:

Instruction selection

In-order multi-issue

Out-of-order issue

WAR hazard

Modified bypass

WAR and WAW

Register renaming

The pipeline

What limits ILP now?

Cycle time reduction

Alternative: VLIW

Current research in μP

Module 3: "Recap: Single-threaded Execution"

Lecture 6: "Instruction Issue Algorithms"

Instruction selection

In-order multi-issue

Out-of-order issue

Module 3: "Recap: Single-threaded Execution"

Lecture 6: "Instruction Issue Algorithms"

WAR hazard

Modified bypass

WAR and WAW

Module 3: "Recap: Single-threaded Execution"

Lecture 6: "Instruction Issue Algorithms"

The pipeline

What limits ILP now?

Cycle time reduction