Dynamic Scheduling in Multi-Function Pipelines - Lecture Notes | ECE 4100, Study notes of Computer Architecture and Organization

Material Type: Notes; Professor: Yalamanchili; Class: Adv Computer Architecure; Subject: Electrical & Computer Engr; University: Georgia Institute of Technology-Main Campus; Term: Fall 2003;

Typology: Study notes

Pre 2010

Uploaded on 08/05/2009

koofers-user-a57-1
koofers-user-a57-1 🇺🇸

10 documents

1 / 24

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ECE 4100/6100: Yalamanchili Fall 2003
Module 2: Dynamic Scheduling in Multi
Module 2: Dynamic Scheduling in Multi-
-
Function Pipelines
Function Pipelines
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18

Partial preview of the text

Download Dynamic Scheduling in Multi-Function Pipelines - Lecture Notes | ECE 4100 and more Study notes Computer Architecture and Organization in PDF only on Docsity!

ECE 4100/6100: Yalamanchili

Fall 2003

Module 2: Dynamic Scheduling in Multi-Module 2: Dynamic Scheduling in Multi

Function PipelinesFunction Pipelines

Fall 2003^

Multi-Cycle Execution Units

-^ Analysis^ –^

Latency vs. clock rate – Presence of forwarding – Issue order vs. completion order – Latency and initiation interval – Hazards^ –^ Structural, data, and control

Fall 2003^

Handling Structural Hazards

-^ Hazard resolution^ –^

Structural hazards^ –^ Functional unit^ –^ Register file write port – Solution: stall in ID

Fall 2003^

Handling Exceptions

-^ Buffering state^ –^

History file and roll-back – Future and commit

-^ Software support •^ Issue restrictions to reduce overhead •^ Live with imprecise exceptions

Fall 2003^

Solution 1: Centralized Scheduling

-^ Reading: A.8 and first few pages of 3.2 •^ Goal: further increase issue rate to approach CPI of 1 •^ What do we need to do?^ –^

Enforce data dependencies – Prevent WAW and WAR hazards

Æ^ centralized control complete

pipeline state

Instruction Execution State and Control

Fall 2003^

Key ideas

-^ Allow bypassing in ID of independent (in terms of dataflow)instructions^ –^

Localize stalls

Æ^ stall only data dependent instructions

-^ Other hazards cause stalls • Break ID into issue and read operand (RO) steps –^ Permit independent instructions to bypass in RO –^ Check for structural hazards in issue stage • Enforce WAR during write back –^ Detect and enforce hazards as late as possible • High bandwidth to and from the register file • No forwarding (will solve later) • Retain^

name^

dependencies

and^

resulting

stalls

(will^

solve

later)

Fall 2003^

Scoreboard Status Information

-^ Data structures keep global status that can be queried by the controllogic •^ Scoreboard implementation is as complex as one functional unit

Divide

Add

Integer

Mult

FU

etc

F

F

F

F

F

F

F

Unit

No

Yes

Integer

F

F

F

Sub

Yes

Add

No

R

F

Load

Yes

Integer

Rk

R^ j

Q^ k

Q^ j

Fk

Fj

Fi

Op

Busy

-^ Store functional unit that will deliver contentsName

dst reg^

src^

src

Source Registers have value?

Function unit producing value

Fall 2003^

Example: Instruction Flow

ADD.D F6,F8,F

DIV.D F10,F0,F

SUB.D F8,F6,F

MUL.D F0,F2,F

L.D F2, 45(R3)

L.D F6, 34(R2)

Write Back Execute RO Issue Instructions

WAR

-^ Function unit latencies: FPADD = 2 cycles, FPMULT = 5 cycles,FPDIV = 15 cycles, FPLOAD = 2 cycles, Integer = 1 cycle •^ Cannot read and write a register in the same cycle •^ All units except FPDIV are pipelined

Fall 2003^

The Essence of Register Renaming

DIV.D

F0, F2, F ADD.D F6, F0, F8S.D^

F6, 0(R1) SUB.D F8, F10, F14MUL.D F6, F10, F

WAR

WAW

DIV.D

F0, F2, F ADD.D S, F0, F8S.D^

S, 0(R1) SUB.D T, F10, F14MUL.D F6, F10, T

-^ Compiler-based renaming •^ Compiler analysis to provide analysis beyond codeblock •^ May extend capabilities beyond that of the compiler(# of reservation stations) •^ Note that many forms of storage used in register re-naming

Fall 2003^

Data Structures

-^ LD/SD buffers act as reservations stations for memory units •^ Instruction execution cannot start until all branches resolved^ –^

Speculation a more complete framework

Busy

A

Vk

Vj

Q^ k

Q^ j

Op

Register

Q^ i

value Reservation stations

Values

Fall 2003^

Memory Disambiguation

-^ Detection of RAW dependencies through memory

SD^

F6, 44(R4) LD^

F8, 32(R8)

-^ Loads must be checked with preceding stores (RAW) •^ Stores must be checked with preceding Loads and Stores(WAW and WAR) •^ A^

simple

scheme:

all^

effective

address

calculations

are

performed in program order^ –^ Buffers’ A field stores effective address^ –^ Can use forwarding directly to/from load/store buffers

RAW Dependency?

Fall 2003^

What Next?

-^ Reading: 3.6 •^ Increase the issue rate! •^ Now issue multiple instructions/cycle^ –^

Issue restrictions simplify control

-^ Increase in^ –^

Forwarding logic complexity – Importance of branch prediction mechanisms – Hardware for concurrent decoding and execution

Multiple Issue Superscalar

VLIW/EPIC

Statically scheduled

Dynamically scheduled

Statically scheduled

Fall 2003^

Design Issues

-^ Issue packet •^ Issue restrictions^ –^

Motivation^ –^ Match the hardware^ –^ Trade-off complexity vs. performance – Enforcement – Impact on penalties

-^ Multiple issue^ –^

Checking within and across packets – Pipelining the issue logic

Instruction

i^

Instruction

i+

Instruction

i+^

Instruction

i+

Issue and fetch multiple instructions

per clock cycle

Fall 2003^

Dynamic Scheduling with Multiple Issue

-^ Widen the issue logic •^ Boost instruction issue (remain in-order!) by using reservationstations to move dependence handling to run-time •^ Match

between

available

functional

units,

distribution

of

dependencies,

and

amount

of^

real^

work^

determines

achievable performance • Examples