Download Comparing ILP and Dynamic Scheduling in CDC 6600 and IBM 360/91 with Tomasulo's Algorithm and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
HighHigh--level viewlevel view
❍ Out-of-order pipeline
Two decoupled pipelines: fetch/dispatch and issue/execute
Pipelines decoupled by buffers
◊ Many names: reservation stations, issue
queues/buffers, scheduling queues
◊ “instruction window”
Instruction issue/execute pipeline
Instruction fetch/dispatch pipeline
fetch decode data dependence checking, register renaming DISPATCH
ISSUE
(insert into window)
(take from window)
WINDOW
issue execute complete (writeback)
In-order
Out-of-order
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
Early dynamically scheduled machinesEarly dynamically scheduled machines
❍ CDC 6600
Centralized control: “CDC scoreboard”
Many replicated functional units
All values pass through the register file
Stall on WAR/WAW hazards
❍ IBM 360/91 (Tomasulo’s algorithm)
Distributed control: “reservation stations”
Several, fully-pipelined functional units ( equivalent to
replicating functional units)
Values broadcast to waiting instructions and register file in
parallel (via the Common Data Bus)
Introduced register renaming : handles WAR/WAW hazards
without stalling
CDC 6600 (MIPS version)CDC 6600 (MIPS version)
❍ Four stages after fetch
Dispatch
◊ Check for structural and WAW hazards
z Structural: Stall in dispatch stage if FU busy.
z WAW: Stall in dispatch stage if an outstanding instruction in the
scoreboard writes the same destination register.
◊ Enter instruction into scoreboard and determine data
dependences.
◊ Route instruction to a free FU, where it waits until data
operands are available Issue
◊ Wait for operands to become ready.
◊ Scoreboard signals when operands are ready.
z Instruction reads registers from the register file,
z then issues to FU for execution.
Execute
Write result
◊ Check for WAR hazard. Stall if an outstanding prior
instruction in the scoreboard reads the same register being written, and the read has not yet taken place
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
WarningWarning –– H&P namingH&P naming
❍ H&P uses different names than what we will use
We: Dispatch They: Issue
We: Issue They: Read operands
We: Execute They: Execute
We: Write result They: Write result
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
CDC 6600 (MIPS version)CDC 6600 (MIPS version)
Registers
FP MULT (1)
FP MULT (2)
FP DIV
FP ADD
Integer Unit (integer MIPS pipeline)
Scoreboard control/status
control/status
ScoreboardScoreboard
❍ Three data structures
1. Instruction status
Which stage the instruction is in
2. Functional unit status
Busy - FU is busy executing an instruction
Op - what instruction is the FU busy with
F
i
- destination register
F
j
, F
k
- source registers
Qj , Qk - functional units producing src regs
Rj , Rk - flags indicating src regs are ready
3. Register result status
Which FU is going to write each register
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
CDC 6600 example (2)CDC 6600 example (2)
DISPATCH ISSUE EXECUTE WRITE RESULT
Instruction status
Functional-unit status
DIV no
ADD no
MULT2no
MULT1no
Integeryes L.D F6 R2 yes
FU busy op F i Fj Fk Q j Q k R j R k
Integer
F0 F2 F4 F6 F8 F10 F
Register-result status
L.D F6 , 34(R2)
L.D F2 , 45(R3)
MUL.D F0 , F2, F
SUB.D F8 , F6, F
DIV.D F10, F0, F
ADD.D F6 , F8, F
❍ On the next cycle, the instr. is issued. What else happens?
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
CDC 6600 example (3)CDC 6600 example (3)
Instruction status DISPATCH ISSUE EXECUTE WRITE RESULT
Functional-unit status
DIV no
ADD no
MULT2no
MULT1no
Integeryes L.D F2 R3 yes
FU busy op F (^) i Fj Fk Q (^) j Q (^) k R (^) j R (^) k
Integer
F0 F2 F4 F6 F8 F10 F
Register-result status
L.D F6 , 34(R2)
L.D F2 , 45(R3)
MUL.D F0 , F2, F
SUB.D F8 , F6, F
DIV.D F10, F0, F
ADD.D F6 , F8, F
CDC 6600 example (4)CDC 6600 example (4)
DISPATCH ISSUE EXECUTE WRITE RESULT
Instruction status
Functional-unit status
DIV no
ADD no
MULT2no
MULT1yes MUL.D F0 F2 F4 Integer no yes
Integeryes L.D F2 R3 yes
FU busy op F (^) i Fj Fk Q (^) j Q (^) k R (^) j R (^) k
MULT1 Integer
F0 F2 F4 F6 F8 F10 F
Register-result status
L.D F6 , 34(R2)
L.D F2 , 45(R3)
MUL.D F0 , F2, F
SUB.D F8 , F6, F
DIV.D F10, F0, F
ADD.D F6 , F8, F
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
CDC 6600 example (5)CDC 6600 example (5)
DISPATCH ISSUE EXECUTE WRITE RESULT
Instruction status
Functional-unit status
DIV no
ADD yes SUB.D F8 F6 F2 Integeryes no
MULT2no
MULT1yes MUL.D F0 F2 F4 Integer no yes
Integeryes L.D F2 R3 yes
FU busy op F i Fj Fk Q j Q k R j R k
MULT1Integer ADD
F0 F2 F4 F6 F8 F10 F
Register-result status
L.D F6 , 34(R2)
L.D F2 , 45(R3)
MUL.D F0 , F2, F
SUB.D F8 , F6, F
DIV.D F10, F0, F
ADD.D F6 , F8, F
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
CDC 6600 example (6)CDC 6600 example (6)
Instruction status DISPATCH ISSUE EXECUTE WRITE RESULT
Functional-unit status
DIV yes DIV.D F10 F0 F6 MULT1 no yes
ADD yes SUB.D F8 F6 F2 Integer yes no
MULT2no
MULT1yes MUL.D F0 F2 F4 Integer no yes
Integeryes L.D F2 R3 yes
FU busy op F (^) i Fj Fk Q (^) j Q (^) k R (^) j R (^) k
MULT1Integer ADD DIV
F0 F2 F4 F6 F8 F10 F
Register-result status
L.D F6 , 34(R2)
L.D F2 , 45(R3)
MUL.D F0 , F2, F
SUB.D F8 , F6, F
DIV.D F10, F0, F
ADD.D F6 , F8, F
CDC 6600 example (7)CDC 6600 example (7)
DISPATCH ISSUE EXECUTE WRITE RESULT
Instruction status
Functional-unit status
DIV yes DIV.D F10 F0 F6 MULT1 no yes
ADD yes SUB.D F8 F6 F2 yes yes
MULT2no
MULT1yes MUL.D F0 F2 F4 yes yes
Integeryes L.D F2 R3 yes
FU busy op F (^) i Fj Fk Q (^) j Q (^) k R (^) j R (^) k
MULT1Integer ADD DIV
F0 F2 F4 F6 F8 F10 F
Register-result status
L.D F6 , 34(R2)
L.D F2 , 45(R3)
MUL.D F0 , F2, F
SUB.D F8 , F6, F
DIV.D F10, F0, F
ADD.D F6 , F8, F
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
finishing..
DISPATCH ISSUE EXECUTE WRITE RESULT
WAR (F6)
RAW (F0)
CDC 6600 example (11)CDC 6600 example (11)
❍ MULTD about to write result…
DIV yes DIVD F10 F0 F6 MULT1 no yes
ADD yes ADDD F6 F8 F2 yes yes
MULT2no
MULT1yes MULTD F0 F2 F4 yes yes
Integerno
FU busy op F i Fj Fk Q j Q k R j R k
MULT1 ADD DIV
F0 F2 F4 F6 F8 F10 F
Instruction status
Functional-unit status
Register-result status
L.D F6 , 34(R2)
L.D F2 , 45(R3)
MUL.D F0 , F2, F
SUB.D F8 , F6, F
DIV.D F10, F0, F
ADD.D F6 , F8, F
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
CDC 6600 timing diagramCDC 6600 timing diagram
ID
EX ID
EX
IDIDIDIDISISEXEXEXEXEXEXEXEXWR
IDISISISISISISISISISISISISISISEX…..
IDISISISEXEXWR
IDISISISISEXEXEXEXEXEXEXEXEXEXWR
ID ISEXEXWR
IDIS WR
=> Notice there are always 2 cycles between EX of data dependent instructions (e.g., L.D and MUL.D): producer does WR and consumer does last IS cycle in which registers are read from the register file. This is an artifact of the CDC 6600: all values must first pass through the register file (no bypasses).
Shaded boxes indicate stalls. RAW
Structural
WAR
2. L.D-MUL.D (F2) 3. L.D-SUB.D (F2) 4. MUL.D-DIV.D (F0) 5. SUB.D-ADD.D (F8)
2 3 4 6 5 7
- L.D-L.D (Integer unit) 6. SUB.D-ADD.D (ADD unit)
- DIV.D-ADD.D (F6)
=> Execution latencies: L.D (2 – agen + access), MUL.D (10), DIV.D (40), SUB.D/ADD.D (2)
L.D F6 , 34(R2)
L.D F2 , 45(R3)
MUL.D F0 , F2, F
SUB.D F8 , F6, F
DIV.D F10, F0, F
ADD.D F6 , F8, F
1
Remaining bottlenecksRemaining bottlenecks
❍ CDC 6600 does a good job of dynamic scheduling around
RAW hazards
❍ Remaining performance limitations
Amount of instruction-level parallelism (ILP) in the program
◊ Maybe not enough data-independent operations
◊ Increase size of window to look farther ahead.
◊ Above requires branch prediction.
Number of scoreboard entries (window size)
◊ Dictates how far processor can look ahead
Number and type of functional units, register ports, etc.
◊ Structural hazards
Anti- and output dependences
◊ Dynamic scheduling exposes more WAW+WAR hazards
because early (OOO) writes are possible
◊ WAR made worse in CDC due to late reads (read operands
when finally issuing)
◊ WAW handled like a structural hazard in dispatch
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
Tomasulo’sTomasulo’s AlgorithmAlgorithm
❍ Born of necessity
Used in IBM 360/91 floating-point unit
Many long-latency operations
◊ Need dynamic scheduling: mitigate long stalls
ISA specified only 4 floating-point registers
◊ Need register renaming : with only 4 registers,
WAW/WAR hazards pop up quickly
◊ Especially in floating-point code: loops by
definition cause repeated writes to same
registers
◊ Renaming: recognize and give unique names to
different dynamic instances of the same register
specifier
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
Key aspects ofKey aspects of Tomasulo’s AlgTomasulo’s Alg..
1. Read register operands at dispatch stage
❍ If operands are available, the data is buffered along with the instruction in “reservation stations”
❍ CDC: only buffers the instruction, all operands are read from register file when all operands are ready (operands read at issue stage)
❍ CDC – late reads / Tomasulo – early reads: early reads help WAR condition
2. Unavailable registers are renamed at dispatch stage
❍ Waiting instructions replace register specifiers with a “tag” indicating the producer instruction
❍ Register specifiers are used only once, at dispatch!
❍ Renaming eliminates WAR/WAW hazards
3. Successive writes to a register
❍ Only last one is actually used to update register: helps WAW condition
❍ CDC: stall in dispatch until WAW hazard goes away
Other differences with CDCOther differences with CDC
❍ Distributed control
Reservation stations (versus scoreboard)
❍ Results broadcast to both register file and
functional units
Result bus called the “Common Data Bus” (CDB)
Don’t have to wait for value to go through register file
(i.e., use bypasses).
Functional units don’t contend for register file ports.
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
Register renamingRegister renaming
❍ Consider simple example with register reuse
❍ Dataflow graph with both true and false
dependences
❍ All instructions execute serially
Due to reuse of F0 by the loads
But those are 2 distinct instances of F
Use different names for 2 instances of F
L.D F0, 34(R2)
ADD.D F4, F0, F
L.D F0, 45(R3)
ADD.D F8, F0, F
L.D (1)
ADD.D (1)
L.D (2)
ADD.D (2)
True dependence (RAW / F0)
True dependence (RAW / F0)
Anti-dependence (WAR / F0)
Output dependence (WAW / F0)
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
Register renamingRegister renaming
❍ Same program segment with F0 renamed
Tomasulo Alg: use reservation station number (tag) of
producer instruction (e.g. load buffer 1 = load1)
This guarantees unique names for unique values
❍ Dataflow graph with only true dependences
Renaming removes output and anti-dependences
Parallelism is exposed
L.D load1, 34(R2) ADD.D F4, load1, F L.D load2, 45(R3) ADD.D F8, load2, F
L.D (1)
ADD.D (1)
True dependence (RAW / load1)
L.D (2)
ADD/D (2)
True dependence (RAW / load2)
How (How (TomasuloTomasulo) renaming works) renaming works
To memory
From memory
OPERAND
BUSES
OPERATION BUS
RESERVATION
STATIONS
FP adders FP multipliers
LOAD
BUFFERS
STORE
BUFFERS
FLT. PT.
OPERATION
QUEUE
From IF unit
4 FP
REGISTERS
COMMON DATA BUS (CDB)
L.D F0 <-
ADD.D <- F
L.D F0 <-
ADD.D <- F
A
B
C
D
F
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
To memory
From memory
OPERAND
BUSES
OPERATION BUS
RESERVATION
STATIONS
FP adders FP multipliers
LOAD
BUFFERS
STORE
BUFFERS
FLT. PT.
OPERATION
QUEUE
From IF unit
4 FP
REGISTERS
COMMON DATA BUS (CDB)
load
ADD.D <- F
L.D F0 <-
ADD.D <- F
A
B
C
D
F0 load
ECE 463/521, Profs.ECE 463/521, Profs. GehringerGehringer,, RotenbergRotenberg, & Conte, Dept. of ECE,, & Conte, Dept. of ECE, NC State UniversityNC State University ILP-
To memory
From memory
OPERAND
BUSES
OPERATION BUS
RESERVATION
STATIONS
FP adders FP multipliers
LOAD
BUFFERS
STORE
BUFFERS
FLT. PT.
OPERATION
QUEUE
From IF unit
4 FP
REGISTERS
COMMON DATA BUS (CDB)
load
L.D F0 <-
ADD.D <- F
A
B
C
D
F0 load
load1 (value)
To memory
From memory
OPERAND
BUSES
OPERATION BUS
RESERVATION
STATIONS
FP adders FP multipliers
LOAD
BUFFERS
STORE
BUFFERS
FLT. PT.
OPERATION
QUEUE
From IF unit
4 FP
REGISTERS
COMMON DATA BUS (CDB)
load
ADD.D <- F
A
B
C
D
F0 load
load1 (^) (value)
load