Pipeline Control and Data Flow: Scoreboard Example, Study notes of Computer Architecture and Organization

An example of the scoreboard method used in pipeline control to manage data flow between functional units and registers in a computer system. It covers the stages of scoreboard control, including issue, read operands, execution, write result, and instruction status. The document also discusses pipeline latencies and the impact of structural hazards on instruction execution.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-2y0
koofers-user-2y0 🇺🇸

8 documents

1 / 25

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Page 1
Dynamic Scheduling
Ca
n’
t
e
limin
ate
t
r
ue
depe
n
de
n
ces
-
t
r
y
to
w
o
rk
Can t
eliminate
true
dependences
try
to
work
around them and avoid stalls
Forwarding helps to avoid hazards (stalls) but it can’t always
mask the delay. E.g., a load that misses the cache.
Hardware rearranges instruction execution to
reduce stalls of dependent instruction (increases
ILP)
permit out
of
order execution
50
ILP)
-
permit
out
-
of
-
order
execution
.
Handles dependences not known at compile-time
Allows code compiled for one pipeline to run efficiently on
another pipeline
Concepts of Dynamic Scheduling
Co
n
s
i
de
r MIP
S
p
i
pe
lin
e
wi
t
h in-
r
r i
:
Co s de
Sppe e t
DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F12,F8,F14
51
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19

Partial preview of the text

Download Pipeline Control and Data Flow: Scoreboard Example and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Dynamic Scheduling

• Can’t eliminate true dependences - try to workCan t eliminate true dependences try to work

around them and avoid stalls

  • Forwarding helps to avoid hazards (stalls) but it can’t always

mask the delay. E.g., a load that misses the cache.

• Hardware rearranges instruction execution to

reduce stalls of dependent instruction (increases

ILP) permit out of order execution

50

ILP) - permit out-of-order execution.

  • Handles dependences not known at compile-time
  • Allows code compiled for one pipeline to run efficiently on

another pipeline

Concepts of Dynamic Scheduling

• Consider MIPS pipeline with in-order issue:Co s de S p pe e t o de ssue

DIVD F0,F2,F

ADDD F10,F0,F

SUBD F12,F8,F

51

Concepts of Dynamic Scheduling

• Consider MIPS pipeline with in-order issue:Consider MIPS pipeline with in order issue:

DIVD F0,F2,F

ADDD F10,F0,F

SUBD F12,F8,F

• SUBD stalls yet isn’t dependent on ADDD or

DIVD

True dependence on F

52

DIVD

Concepts of Dynamic Scheduling

• Consider MIPS pipeline with in-order issue:Co s de S p pe e t o de ssue

DIVD F0,F2,F

ADDD F10,F0,F

SUBD F12,F8,F

• SUBD stalls yet isn’t dependent on ADDD or

DIVD

STALL

True dependence on F

53

DIVD

• Suppose we let SUBD “move around” the stall

and execute out of order?

New Hazards Possible

• Out-of-order execution leads to WAR hazardsOut of order execution leads to WAR hazards

DIVD F0,F2,F

ADDD F10,F0,F

SUBD F8,F8,F

56

New Hazards Possible

• Out-of-order execution leads to WAR hazardsOut o o de e ecut o eads to a a ds

DIVD F0,F2,F

ADDD F10,F0,F

SUBD F8,F8,F

• Of course, WAW hazards are now also possible

WAR

57

Scoreboarding

• Keep track of instructions, functional units, andKeep track of instructions, functional units, and

registers to handle hazards

• Goal: CPI=1 - “pull out” independent instructions

later in the “issue window”.

• Wait queue after issue to hold stalled instructions

waiting for operands (it may not really exist)

58

• Multiple or pipelined functional units (recall:

during issue, we stall on a structural hazard)

What does the Scoreboard do???

• Record data dependencies of instructionseco d data depe de c es o st uct o s

• Determines when an instruction can read its

operands and begin execution

• Can’t execute? Monitor state of operands to

decide when instruction can execute

• Determines when an instruction can write its

result

59

result

⇒ Scoreboard does all hazard detection and

resolution

Step 0: Fetch (F)

Int ALU

Register File

Int ALU

Write result

to memory

or registers

FP multiply

FP divide

Int Divide

62

IF Queue ID Wait Queue Scoreboard

Step 1: Issue (I)

Int ALU

Register File

Int ALU

Write result

to memory

or registers

FP multiply

FP divide

Int Divide

63

IF Queue ID Wait Queue Scoreboard

Step 1: Read Operands (RO)

Int ALU

Register File

Int ALU

Write result

to memory

or registers

FP multiply

FP divide

Int Divide

64

IF Queue ID Wait Queue Scoreboard

Step 3: Execute (EX)

Int ALU

Register File

Int ALU

Write result

to memory

or registers

FP multiply

FP divide

Int Divide

65

IF Queue ID Wait Queue Scoreboard

Hazard Detection

• Structural hazards - in I, ensure that a functionalStructural hazards in I, ensure that a functional

unit is available (makes a “reservation”)

• WAW hazards - in I, ensure that no previous

active instruction has the same destination

• RAW hazards - in RO, check that no previous

active instruction writes a source register

• WAR hazards - in WR, before writing a result,

68

WAR hazards in WR, before writing a result,

check if any previous instructions that haven’t

gone past RO need that register as a source

Scoreboard Pipeline Control

Status Wait until Bookkeeping

Issue FU not busy Busy(FU)←Y; Op(FU)←op; Fi(FU)←D;

&& not result Fj(FU)←S1; Fk(FU)←S2; Qj←Res(S1);

Qk←Res(S2); Rj←!Qj; Rk←!Qk;

Res(D)←FU

Read ops Rj && Rk Rj←N; Rk←N

Executed FU done

69

Write dest ∀f((Fj(f)≠Fi(FU)|| ∀f(if Qj(f)=FU, Rj(f)←Y);

Rj(f)=N) && ∀f(if Qk(f)=FU, Rj(f)←Y);

(Fk(f)≠Fi(FU) || Result(Fi(FU))←0; Busy(FU)←N

Rk(f)=N))

Write Destination

Wait Until Bookkeeping

∀f((Fj(f)≠Fi(FU)|| ∀f(if Qj(f)=FU, Rj(f)←Y);

Rj(f)=N) && ∀f(if Qk(f)=FU, Rj(f)←Y);

(Fk(f)≠Fi(FU) || Result(Fi(FU))←0; Busy(FU)←N

Rk(f)=N))

Wait until condition says:

Fj(f)≠Fi(FU) does this FU write a result used by another FU?

70

j( ) ( ) y

Rj(f)=N is the other FU waiting for this result?

Bookkeeping says:

if Qj(f)=FU, Rj(f)←Y set register ready for all consumer FUs

Result(Fi(FU))←0; clear entry in the register result table

Scoreboard Example

Code Sequenceq

LD F6, 34(R2)

LD F2, 45(R3)

MULT F0, F2, F

SUBD F8, F6, F

DIVD F10, F0, F

ADDD F6, F8, F

Functional Units

1 I t 2 FP lti li 1 FP dd 1 FP di id

71

1 Integer, 2 FP multipliers, 1 FP adder, 1 FP divider

Pipeline Latencies

LD 1 cycle

MULT 10 cycles

SUBD, ADDD 2 cycles

DIVD 40 cycles

Instruction status Op d j k Issue Read Finish Write LD F6 34+ R2 1 2 LD F2 45+ R MULT F0 F2 F SUBD F8 F6 F DIVD F10 F0 F ADDD F6 F8 F

Functional unit status Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer Yes Ld F6 R2 Yes Mult1 No Mult2 No Add No Divide No

74

Register result status F0 F2 F4 F6 F8 F10 F12 ... F FU Int

CLOCK 2

Issue 2nd LD?

Instruction status Op d j k Issue Read Finish Write LD F6 34+ R2 1 2 3 LD F2 45+ R MULT F0 F2 F SUBD F8 F6 F DIVD F10 F0 F ADDD F6 F8 F

Functional unit status Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer Yes Ld F6 R2 Yes Mult1 No Mult2 No Add No Divide No

75

Register result status F0 F2 F4 F6 F8 F10 F12 ... F FU Int

CLOCK 3 - single cycle load completes

What if we had a multi-cycle load???

Instruction status Op d j k Issue Read Finish Write LD F6 34+ R2 1 2 3 4 LD F2 45+ R MULT F0 F2 F SUBD F8 F6 F DIVD F10 F0 F ADDD F6 F8 F

Functional unit status Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer Yes Ld F6 R2 Yes Mult1 No Mult2 No Add No Divide No

76

Register result status F0 F2 F4 F6 F8 F10 F12 ... F FU Int

CLOCK 4 - load completes, write result (WAR?)

Instruction status Op d j k Issue Read Finish Write LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 MULT F0 F2 F SUBD F8 F6 F DIVD F10 F0 F ADDD F6 F8 F

Functional unit status Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer Yes Ld F2 R3 Yes Mult1 No Mult2 No Add No Divide No

77

Register result status F0 F2 F4 F6 F8 F10 F12 ... F FU Int

CLOCK 5 - structural hazard for Int unit resolved

Instruction status Op d j k Issue Read Finish Write LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 MULT F0 F2 F4 6 SUBD F8 F6 F2 7 DIVD F10 F0 F6 8 ADDD F6 F8 F

Functional unit status Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer Yes Ld F2 R3 Yes Mult1 Yes Mult F0 F2 F4 Int No Yes Mult2 No Add Yes Sub F8 F6 F2 Int Yes No Divide Yes Div F10 F0 F6 M1 No Yes

80

Register result status F0 F2 F4 F6 F8 F10 F12 ... F FU Mult1 Int Add Div

CLOCK 8(a) - immediately before LD completes

Instruction status Op d j k Issue Read Finish Write LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MULT F0 F2 F4 6 SUBD F8 F6 F2 7 DIVD F10 F0 F6 8 ADDD F6 F8 F

Functional unit status Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer No Mult1 Yes Mult F0 F2 F4 Yes Yes Mult2 No Add Yes Sub F8 F6 F2 Yes Yes Divide Yes Div F10 F0 F6 M1 No Yes

81

Register result status F0 F2 F4 F6 F8 F10 F12 ... F FU Mult1 Add Div

CLOCK 8(b) (the LD completes. M1 and ADD ready.)

Instruction status Op d j k Issue Read Finish Write LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MULT F0 F2 F4 6 9 SUBD F8 F6 F2 7 9 DIVD F10 F0 F6 8 ADDD F6 F8 F

Functional unit status Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer No 10 Mult1 Yes Mult F0 F2 F4 Yes Yes Mult2 No 2 Add Yes Sub F8 F6 F2 Yes Yes Divide Yes Div F10 F0 F6 M1 No Yes

82

Register result status F0 F2 F4 F6 F8 F10 F12 ... F FU Mult1 Add Div

CLOCK 9 - MULT and SUBD’s operands ready, go!

Can the ADDD issue next cycle? What happens?

Instruction status Op d j k Issue Read Finish Write LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MULT F0 F2 F4 6 9 SUBD F8 F6 F2 7 9 11 DIVD F10 F0 F6 8 ADDD F6 F8 F

Functional unit status Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer No 8 Mult1 Yes Mult F0 F2 F4 Yes Yes Mult2 No 0 Add Yes Sub F8 F6 F2 Yes Yes Divide Yes Div F10 F0 F6 M1 No Yes

83

Register result status F0 F2 F4 F6 F8 F10 F12 ... F FU Mult1 Add Div

CLOCK 11 - SUBD finishes before MULT

Can SUBD write its result?

Instruction status Op d j k Issue Read Finish Write LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MULT F0 F2 F4 6 9 SUBD F8 F6 F2 7 9 11 12 DIVD F10 F0 F6 8 ADDD F6 F8 F2 13 14

Functional unit status Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer No 5 Mult1 Yes Mult F0 F2 F4 Yes Yes Mult2 No 2 Add Yes Add F6 F8 F2 Yes Yes Divide Yes Div F10 F0 F6 M1 No Yes

86

Register result status F0 F2 F4 F6 F8 F10 F12 ... F FU Mult1 Add Div

CLOCK 14 - ADDD reads operands

Instruction status Op d j k Issue Read Finish Write LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MULT F0 F2 F4 6 9 SUBD F8 F6 F2 7 9 11 12 DIVD F10 F0 F6 8 ADDD F6 F8 F2 13 14

Functional unit status Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer No 4 Mult1 Yes Mult F0 F2 F4 Yes Yes Mult2 No 1 Add Yes Add F6 F8 F2 Yes Yes Divide Yes Div F10 F0 F6 M1 No Yes

87

Register result status F0 F2 F4 F6 F8 F10 F12 ... F FU Mult1 Add Div

CLOCK 15

Instruction status Op d j k Issue Read Finish Write LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MULT F0 F2 F4 6 9 SUBD F8 F6 F2 7 9 11 12 DIVD F10 F0 F6 8 ADDD F6 F8 F2 13 14 16

Functional unit status Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer No 3 Mult1 Yes Mult F0 F2 F4 Yes Yes Mult2 No 0 Add Yes Add F6 F8 F2 Yes Yes Divide Yes Div F10 F0 F6 M1 No Yes

88

Register result status F0 F2 F4 F6 F8 F10 F12 ... F FU Mult1 Add Div

CLOCK 16 - ADDD finishes

Instruction status Op d j k Issue Read Finish Write LD F6 34+ R2 1 2 3 4 LD F2 45+ R3 5 6 7 8 MULT F0 F2 F4 6 9 SUBD F8 F6 F2 7 9 11 12 DIVD F10 F0 F6 8 ADDD F6 F8 F2 13 14 16

Functional unit status Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk Integer No 2 Mult1 Yes Mult F0 F2 F4 Yes Yes Mult2 No Add Yes Add F6 F8 F2 Yes Yes Divide Yes Div F10 F0 F6 M1 No Yes

89

Register result status F0 F2 F4 F6 F8 F10 F12 ... F FU Mult1 Add Div

CLOCK 17

Can we write result of ADDD? What about the Add FU?