Comparing ILP and Dynamic Scheduling in CDC 6600 and IBM 360/91 with Tomasulo's Algorithm , Study notes of Electrical and Electronics Engineering

An in-depth comparison of instruction-level parallelism (ilp) and dynamic scheduling techniques in computer architecture through the analysis of cdc 6600 and ibm 360/91 (tomasulo's algorithm). Various aspects, including pipeline architecture, reservation stations, scoreboards, and register renaming.

Typology: Study notes

Pre 2010

Uploaded on 03/18/2009

koofers-user-250
koofers-user-250 🇺🇸

9 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
ECE 463/521, Profs.
ECE 463/521, Profs. Gehringer
Gehringer,
, Rotenberg
Rotenberg, & Conte, Dept. of ECE,
, & Conte, Dept. of ECE,NC State University
NC State University ILP-1
High
High-
-level view
level view
Out-of-order pipeline
Two decoupled pipelines: fetch/dispatch and issue/ execute
Pipelines decoupled by buffers
Many names: reservation stations, issue
queues/buffers, scheduling queues
“instruction window”
Instruction
issue/execute
pipeline
Instruction
fetch/dispatch
pipeline
fetch
decode
data dependence checking,
register renaming
DISPATCH
ISSUE
(insert into window)
(take from window)
WINDOW
issue
execute
complete (writeback)
In-order
Out-of-order
ECE 463/521, Profs.
ECE 463/521, Profs. Gehringer
Gehringer,
, Rotenberg
Rotenberg, & Conte, Dept. of ECE,
, & Conte, Dept. of ECE,NC State University
NC State University ILP-2
Early dynamically scheduled machines
Early dynamically scheduled machines
CDC 6600
Centralized control: “CDC scoreboard”
Many replicated functional units
All values pass through the register file
Stall on WAR/WAW hazards
IBM 360/91 (Tomasulo’s algorithm)
Distributed control: “reservation stations”
Several, fully-pipelined functional units (eq uivalent to
replicating functional units)
Values broadcast to waiting instructions and regis ter file in
parallel (via the Common Data Bus)
Introduced register renaming: handles WAR/WAW hazards
without stalling
ECE 463/521, Profs.
ECE 463/521, Profs. Gehringer
Gehringer,
, Rotenberg
Rotenberg, & Conte, Dept. of ECE,
, & Conte, Dept. of ECE,NC State University
NC State University ILP-3
CDC 6600 (MIPS version)
CDC 6600 (MIPS version)
Four stages after fetch
Dispatch
Check for structural and WAW hazards
Structural: Stall in dispatch stage if FU busy.
WAW: Stall in dispatch stage if an outstanding instruction in the
scoreboard writes the same destination register.
Enter instruction into scoreboard and determine data
dependences.
Route instruction to a free FU, where it waits until data
operands are available
Issue
Wait for operands to become ready.
Scoreboard signals when operands are ready.
Instruction reads registers from the register file,
then issues to FU for execution.
Execute
Write result
Check for WAR hazard. Stall if an outstanding prior
instruction in the scoreboard reads the same register being
written, and the read has not yet taken place
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Comparing ILP and Dynamic Scheduling in CDC 6600 and IBM 360/91 with Tomasulo's Algorithm and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

HighHigh--level viewlevel view

❍ Out-of-order pipeline

‹ Two decoupled pipelines: fetch/dispatch and issue/execute

‹ Pipelines decoupled by buffers

◊ Many names: reservation stations, issue

queues/buffers, scheduling queues

◊ “instruction window”

Instruction issue/execute pipeline

Instruction fetch/dispatch pipeline

fetch decode data dependence checking, register renaming DISPATCH

ISSUE

(insert into window)

(take from window)

WINDOW

issue execute complete (writeback)

In-order

Out-of-order

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

Early dynamically scheduled machinesEarly dynamically scheduled machines

❍ CDC 6600

‹ Centralized control: “CDC scoreboard”

‹ Many replicated functional units

‹ All values pass through the register file

‹ Stall on WAR/WAW hazards

❍ IBM 360/91 (Tomasulo’s algorithm)

‹ Distributed control: “reservation stations”

‹ Several, fully-pipelined functional units ( equivalent to

replicating functional units)

‹ Values broadcast to waiting instructions and register file in

parallel (via the Common Data Bus)

‹ Introduced register renaming : handles WAR/WAW hazards

without stalling

CDC 6600 (MIPS version)CDC 6600 (MIPS version)

❍ Four stages after fetch

‹ Dispatch

◊ Check for structural and WAW hazards

z Structural: Stall in dispatch stage if FU busy.

z WAW: Stall in dispatch stage if an outstanding instruction in the

scoreboard writes the same destination register.

◊ Enter instruction into scoreboard and determine data

dependences.

◊ Route instruction to a free FU, where it waits until data

operands are available ‹ Issue

◊ Wait for operands to become ready.

◊ Scoreboard signals when operands are ready.

z Instruction reads registers from the register file,

z then issues to FU for execution.

‹ Execute

‹ Write result

◊ Check for WAR hazard. Stall if an outstanding prior

instruction in the scoreboard reads the same register being written, and the read has not yet taken place

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

WarningWarning –– H&P namingH&P naming

❍ H&P uses different names than what we will use

‹We: Dispatch They: Issue

‹We: Issue They: Read operands

‹We: Execute They: Execute

‹We: Write result They: Write result

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

CDC 6600 (MIPS version)CDC 6600 (MIPS version)

Registers

FP MULT (1)

FP MULT (2)

FP DIV

FP ADD

Integer Unit (integer MIPS pipeline)

Scoreboard control/status

control/status

ScoreboardScoreboard

❍ Three data structures

1. Instruction status

ƒ Which stage the instruction is in

2. Functional unit status

ƒ Busy - FU is busy executing an instruction

ƒ Op - what instruction is the FU busy with

ƒ F

i

- destination register

ƒ F

j

, F

k

- source registers

ƒ Qj , Qk - functional units producing src regs

ƒ Rj , Rk - flags indicating src regs are ready

3. Register result status

ƒ Which FU is going to write each register

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

CDC 6600 example (2)CDC 6600 example (2)

DISPATCH ISSUE EXECUTE WRITE RESULT

Instruction status

Functional-unit status

DIV no

ADD no

MULT2no

MULT1no

Integeryes L.D F6 R2 yes

FU busy op F i Fj Fk Q j Q k R j R k

Integer

F0 F2 F4 F6 F8 F10 F

Register-result status

L.D F6 , 34(R2)

L.D F2 , 45(R3)

MUL.D F0 , F2, F

SUB.D F8 , F6, F

DIV.D F10, F0, F

ADD.D F6 , F8, F

❍ On the next cycle, the instr. is issued. What else happens?

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

CDC 6600 example (3)CDC 6600 example (3)

Instruction status DISPATCH ISSUE EXECUTE WRITE RESULT

Functional-unit status

DIV no

ADD no

MULT2no

MULT1no

Integeryes L.D F2 R3 yes

FU busy op F (^) i Fj Fk Q (^) j Q (^) k R (^) j R (^) k

Integer

F0 F2 F4 F6 F8 F10 F

Register-result status

L.D F6 , 34(R2)

L.D F2 , 45(R3)

MUL.D F0 , F2, F

SUB.D F8 , F6, F

DIV.D F10, F0, F

ADD.D F6 , F8, F

CDC 6600 example (4)CDC 6600 example (4)

DISPATCH ISSUE EXECUTE WRITE RESULT

Instruction status

Functional-unit status

DIV no

ADD no

MULT2no

MULT1yes MUL.D F0 F2 F4 Integer no yes

Integeryes L.D F2 R3 yes

FU busy op F (^) i Fj Fk Q (^) j Q (^) k R (^) j R (^) k

MULT1 Integer

F0 F2 F4 F6 F8 F10 F

Register-result status

L.D F6 , 34(R2)

L.D F2 , 45(R3)

MUL.D F0 , F2, F

SUB.D F8 , F6, F

DIV.D F10, F0, F

ADD.D F6 , F8, F

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

CDC 6600 example (5)CDC 6600 example (5)

DISPATCH ISSUE EXECUTE WRITE RESULT

Instruction status

Functional-unit status

DIV no

ADD yes SUB.D F8 F6 F2 Integeryes no

MULT2no

MULT1yes MUL.D F0 F2 F4 Integer no yes

Integeryes L.D F2 R3 yes

FU busy op F i Fj Fk Q j Q k R j R k

MULT1Integer ADD

F0 F2 F4 F6 F8 F10 F

Register-result status

L.D F6 , 34(R2)

L.D F2 , 45(R3)

MUL.D F0 , F2, F

SUB.D F8 , F6, F

DIV.D F10, F0, F

ADD.D F6 , F8, F

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

CDC 6600 example (6)CDC 6600 example (6)

Instruction status DISPATCH ISSUE EXECUTE WRITE RESULT

Functional-unit status

DIV yes DIV.D F10 F0 F6 MULT1 no yes

ADD yes SUB.D F8 F6 F2 Integer yes no

MULT2no

MULT1yes MUL.D F0 F2 F4 Integer no yes

Integeryes L.D F2 R3 yes

FU busy op F (^) i Fj Fk Q (^) j Q (^) k R (^) j R (^) k

MULT1Integer ADD DIV

F0 F2 F4 F6 F8 F10 F

Register-result status

L.D F6 , 34(R2)

L.D F2 , 45(R3)

MUL.D F0 , F2, F

SUB.D F8 , F6, F

DIV.D F10, F0, F

ADD.D F6 , F8, F

CDC 6600 example (7)CDC 6600 example (7)

DISPATCH ISSUE EXECUTE WRITE RESULT

Instruction status

Functional-unit status

DIV yes DIV.D F10 F0 F6 MULT1 no yes

ADD yes SUB.D F8 F6 F2 yes yes

MULT2no

MULT1yes MUL.D F0 F2 F4 yes yes

Integeryes L.D F2 R3 yes

FU busy op F (^) i Fj Fk Q (^) j Q (^) k R (^) j R (^) k

MULT1Integer ADD DIV

F0 F2 F4 F6 F8 F10 F

Register-result status

L.D F6 , 34(R2)

L.D F2 , 45(R3)

MUL.D F0 , F2, F

SUB.D F8 , F6, F

DIV.D F10, F0, F

ADD.D F6 , F8, F

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

finishing..

DISPATCH ISSUE EXECUTE WRITE RESULT

WAR (F6)

RAW (F0)

CDC 6600 example (11)CDC 6600 example (11)

❍ MULTD about to write result…

DIV yes DIVD F10 F0 F6 MULT1 no yes

ADD yes ADDD F6 F8 F2 yes yes

MULT2no

MULT1yes MULTD F0 F2 F4 yes yes

Integerno

FU busy op F i Fj Fk Q j Q k R j R k

MULT1 ADD DIV

F0 F2 F4 F6 F8 F10 F

Instruction status

Functional-unit status

Register-result status

L.D F6 , 34(R2)

L.D F2 , 45(R3)

MUL.D F0 , F2, F

SUB.D F8 , F6, F

DIV.D F10, F0, F

ADD.D F6 , F8, F

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

CDC 6600 timing diagramCDC 6600 timing diagram

ID

EX ID

EX

IDIDIDIDISISEXEXEXEXEXEXEXEXWR

IDISISISISISISISISISISISISISISEX…..

IDISISISEXEXWR

IDISISISISEXEXEXEXEXEXEXEXEXEXWR

ID ISEXEXWR

IDIS WR

=> Notice there are always 2 cycles between EX of data dependent instructions (e.g., L.D and MUL.D): producer does WR and consumer does last IS cycle in which registers are read from the register file. This is an artifact of the CDC 6600: all values must first pass through the register file (no bypasses).

Shaded boxes indicate stalls. RAW

Structural

WAR

2. L.D-MUL.D (F2) 3. L.D-SUB.D (F2) 4. MUL.D-DIV.D (F0) 5. SUB.D-ADD.D (F8)

2 3 4 6 5 7

  1. L.D-L.D (Integer unit) 6. SUB.D-ADD.D (ADD unit)
  2. DIV.D-ADD.D (F6)

=> Execution latencies: L.D (2 – agen + access), MUL.D (10), DIV.D (40), SUB.D/ADD.D (2)

L.D F6 , 34(R2)

L.D F2 , 45(R3)

MUL.D F0 , F2, F

SUB.D F8 , F6, F

DIV.D F10, F0, F

ADD.D F6 , F8, F

1

Remaining bottlenecksRemaining bottlenecks

❍ CDC 6600 does a good job of dynamic scheduling around

RAW hazards

❍ Remaining performance limitations

‹ Amount of instruction-level parallelism (ILP) in the program

◊ Maybe not enough data-independent operations

◊ Increase size of window to look farther ahead.

◊ Above requires branch prediction.

‹ Number of scoreboard entries (window size)

◊ Dictates how far processor can look ahead

‹ Number and type of functional units, register ports, etc.

◊ Structural hazards

‹ Anti- and output dependences

◊ Dynamic scheduling exposes more WAW+WAR hazards

because early (OOO) writes are possible

◊ WAR made worse in CDC due to late reads (read operands

when finally issuing)

◊ WAW handled like a structural hazard in dispatch

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

Tomasulo’sTomasulo’s AlgorithmAlgorithm

❍ Born of necessity

‹ Used in IBM 360/91 floating-point unit

‹ Many long-latency operations

◊ Need dynamic scheduling: mitigate long stalls

‹ ISA specified only 4 floating-point registers

◊ Need register renaming : with only 4 registers,

WAW/WAR hazards pop up quickly

◊ Especially in floating-point code: loops by

definition cause repeated writes to same

registers

◊ Renaming: recognize and give unique names to

different dynamic instances of the same register

specifier

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

Key aspects ofKey aspects of Tomasulo’s AlgTomasulo’s Alg..

1. Read register operands at dispatch stage

❍ If operands are available, the data is buffered along with the instruction in “reservation stations”

❍ CDC: only buffers the instruction, all operands are read from register file when all operands are ready (operands read at issue stage)

❍ CDC – late reads / Tomasulo – early reads: early reads help WAR condition

2. Unavailable registers are renamed at dispatch stage

❍ Waiting instructions replace register specifiers with a “tag” indicating the producer instruction

❍ Register specifiers are used only once, at dispatch!

❍ Renaming eliminates WAR/WAW hazards

3. Successive writes to a register

❍ Only last one is actually used to update register: helps WAW condition

❍ CDC: stall in dispatch until WAW hazard goes away

Other differences with CDCOther differences with CDC

❍ Distributed control

‹ Reservation stations (versus scoreboard)

❍ Results broadcast to both register file and

functional units

‹ Result bus called the “Common Data Bus” (CDB)

‹ Don’t have to wait for value to go through register file

(i.e., use bypasses).

‹ Functional units don’t contend for register file ports.

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

Register renamingRegister renaming

❍ Consider simple example with register reuse

❍ Dataflow graph with both true and false

dependences

❍ All instructions execute serially

‹ Due to reuse of F0 by the loads

‹ But those are 2 distinct instances of F

‹ Use different names for 2 instances of F

L.D F0, 34(R2)

ADD.D F4, F0, F

L.D F0, 45(R3)

ADD.D F8, F0, F

L.D (1)

ADD.D (1)

L.D (2)

ADD.D (2)

True dependence (RAW / F0)

True dependence (RAW / F0)

Anti-dependence (WAR / F0)

Output dependence (WAW / F0)

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

Register renamingRegister renaming

❍ Same program segment with F0 renamed

‹ Tomasulo Alg: use reservation station number (tag) of

producer instruction (e.g. load buffer 1 = load1)

‹ This guarantees unique names for unique values

❍ Dataflow graph with only true dependences

‹ Renaming removes output and anti-dependences

‹ Parallelism is exposed

L.D load1, 34(R2) ADD.D F4, load1, F L.D load2, 45(R3) ADD.D F8, load2, F

L.D (1)

ADD.D (1)

True dependence (RAW / load1)

L.D (2)

ADD/D (2)

True dependence (RAW / load2)

How (How (TomasuloTomasulo) renaming works) renaming works

To memory

From memory

OPERAND

BUSES

OPERATION BUS

RESERVATION

STATIONS

FP adders FP multipliers

LOAD

BUFFERS

STORE

BUFFERS

FLT. PT.

OPERATION

QUEUE

From IF unit

4 FP

REGISTERS

COMMON DATA BUS (CDB)

L.D F0 <-

ADD.D <- F

L.D F0 <-

ADD.D <- F

A

B

C

D

F

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

To memory

From memory

OPERAND

BUSES

OPERATION BUS

RESERVATION

STATIONS

FP adders FP multipliers

LOAD

BUFFERS

STORE

BUFFERS

FLT. PT.

OPERATION

QUEUE

From IF unit

4 FP

REGISTERS

COMMON DATA BUS (CDB)

load

ADD.D <- F

L.D F0 <-

ADD.D <- F

A

B

C

D

F0 load

ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-

To memory

From memory

OPERAND

BUSES

OPERATION BUS

RESERVATION

STATIONS

FP adders FP multipliers

LOAD

BUFFERS

STORE

BUFFERS

FLT. PT.

OPERATION

QUEUE

From IF unit

4 FP

REGISTERS

COMMON DATA BUS (CDB)

load

L.D F0 <-

ADD.D <- F

A

B

C

D

F0 load

load1 (value)

To memory

From memory

OPERAND

BUSES

OPERATION BUS

RESERVATION

STATIONS

FP adders FP multipliers

LOAD

BUFFERS

STORE

BUFFERS

FLT. PT.

OPERATION

QUEUE

From IF unit

4 FP

REGISTERS

COMMON DATA BUS (CDB)

load

ADD.D <- F

A

B

C

D

F0 load

load1 (^) (value)

load