Speculative Tomasulo Example - Advance Computers Architectures - Lecture Slides, Slides of Computer Architecture and Organization

Main points of this lecture are: Speculative Tomasulo Example, Instruction Level Parallelism, Leverage Implicit Parallelism, Branch Prediction, Loop Unrolling, Loop Unrolling, Value Prediction, Reorder Buffer, Instruction Bandwidth, Data Flow Execution

Typology: Slides

2012/2013

Uploaded on 04/23/2013

atasi
atasi šŸ‡®šŸ‡³

4.6

(32)

134 documents

1 / 35

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CIS 600 Advanced Computer
Architecture
Lecture 6 – Instruction Level
Parallelism
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23

Partial preview of the text

Download Speculative Tomasulo Example - Advance Computers Architectures - Lecture Slides and more Slides Computer Architecture and Organization in PDF only on Docsity!

CIS 600 Advanced Computer

Architecture

Lecture 6 – Instruction Level

Parallelism

Review from Last Time

• Leverage Implicit Parallelism for Performance:

Instruction Level Parallelism

• Loop unrolling by compiler to increase ILP

• Branch prediction to increase ILP

• Dynamic HW exploiting ILP

– Works when can’t know dependence at compile

time

– Can hide L1 cache misses

– Code for one machine runs well on another

Outline

• ILP

• Speculation

• Speculative Tomasulo Example

• Memory Aliases

• Exceptions

• VLIW

• Increasing instruction bandwidth

• Register Renaming vs. Reorder Buffer

• Value Prediction

Speculation to greater ILP

• Greater ILP: Overcome control dependence

by hardware speculating on outcome of

branches and executing program as if

guesses were correct

  • Speculation ⇒ fetch, issue, and execute instructions as if branch

predictions were always correct

  • Dynamic scheduling ⇒ only fetches and issues instructions

• Essentially a data flow execution model:

Operations execute as soon as their

operands are available

Adding Speculation to Tomasulo

• Must separate execution from allowing

instruction to finish or ā€œcommitā€

• This additional step called instruction

commit

• When an instruction is no longer speculative,

allow it to update the register file or memory

• Requires additional set of buffers to hold

results of instructions that have finished

execution but have not committed

• This reorder buffer (ROB) is also used to pass

results among instructions that may be

speculated

Reorder Buffer (ROB)

• In Tomasulo’s algorithm, once an instruction writes its

result, any subsequently issued instructions will find result

in the register file

• With speculation, the register file is not updated until the

instruction commits

  • (we know definitively that the instruction should execute)

• Thus, the ROB supplies operands in interval between

completion of instruction execution and instruction commit

  • ROB is a source of operands for instructions, just as reservation

stations (RS) provide operands in Tomasulo’s algorithm

  • ROB extends architectured registers like RS

Reorder Buffer operation

  • Holds instructions in FIFO order, exactly as issued
  • When instructions complete, results placed into ROB
    • Supplies operands to other instruction between execution complete & commit ⇒ more registers like RS
    • Tag results with ROB buffer number instead of reservation station
  • Instructions commit ⇒values at head of ROB placed in

registers

  • As a result, easy to undo

speculated instructions

on mispredicted branches

or on exceptions

Reorder

FP^ Buffer Op Queue

FP Adder FP Adder

Res Stations Res Stations

FP Regs

Commit path

Recall: 4 Steps of Speculative

Tomasulo Algorithm

1. Issue—get instruction from FP Op Queue

If reservation station and reorder buffer slot

free, issue instr & send operands & reorder

buffer no. for destination (this stage sometimes

called ā€œdispatchā€)

2. Execution—operate on operands (EX)

When both operands ready then execute; if not

ready, watch CDB for result; when both in

reservation station, execute; checks RAW

(sometimes called ā€œissueā€)

3. Write result—finish execution (WB)

Write on Common Data Bus to all awaiting FUs

& reorder buffer; mark reservation station

available.

Docsity.com

Tomasulo With Reorder buffer:

2 ADDD R(F4),ROB

To Memory

FP adders FP multipliers

Reservation Stations

FP Op Queue

ROB ROB

ROB

ROB

ROB

ROB

ROB

F
F
ADDD F10,F4,F
LD F0,10(R2)
N
N

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R

Dest

Reorder Buffer

Registers

Tomasulo With Reorder buffer:

3 DIVD ROB2,R(F6)
2 ADDD R(F4),ROB

To Memory

FP adders FP multipliers

Reservation Stations

FP Op Queue

ROB ROB

ROB

ROB

ROB

ROB

ROB

F
F
F
DIVD F2,F10,F
ADDD F10,F4,F
LD F0,10(R2)
N
N
N

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R

Dest

Reorder Buffer

Registers

Tomasulo With Reorder buffer:

3 DIVD ROB2,R(F6)
2 ADDD R(F4),ROB
6 ADDD ROB5, R(F6)

To Memory

FP adders FP multipliers

Reservation Stations

FP Op Queue

ROB ROB

ROB

ROB

ROB

ROB

ROB

F
ROB5 ST 0(R3),F
ADDD F0,F4,F
N
N
F4 LD F4,0(R3) N
-- BNE F2,<…> N
F
F
F
DIVD F2,F10,F
ADDD F10,F4,F
LD F0,10(R2)
N
N
N

Done?

Dest Dest

Oldest

Newest

from Memory

Dest

Reorder Buffer

Registers

1 10+R
5 0+R

Tomasulo With Reorder buffer:

3 DIVD ROB2,R(F6)

To Memory

FP adders FP multipliers

Reservation Stations

FP Op Queue

ROB ROB

ROB

ROB

ROB

ROB

ROB

F
M[10] ST 0(R3),F
ADDD F0,F4,F
Y
N
F4 M[10] LD F4,0(R3) Y
-- BNE F2,<…> N
F
F
F
DIVD F2,F10,F
ADDD F10,F4,F
LD F0,10(R2)
N
N
N

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R

Dest

Reorder Buffer

Registers

2 ADDD R(F4),ROB
6 ADDD M[10],R(F6)

Tomasulo With Reorder buffer:

F
M[10]

ST 0(R3),F
ADDD F0,F4,F
Y

Ex F4 M[10] LD F4,0(R3) Y -- BNE F2,<…> N

3 DIVD ROB2,R(F6)
2 ADDD R(F4),ROB

To Memory

FP adders FP multipliers

Reservation Stations

FP Op Queue

ROB ROB

ROB

ROB

ROB

ROB

ROB

F
F
F
DIVD F2,F10,F
ADDD F10,F4,F
LD F0,10(R2)
N
N
N

Done?

Dest Dest

Oldest

Newest

from Memory

1 10+R

Dest

Reorder Buffer

Registers

What about memory

hazards???

Avoiding Memory Hazards

• WAW and WAR hazards through memory are

eliminated with speculation because actual

updating of memory occurs in order, when a

store is at head of the ROB, and hence, no

earlier loads or stores can still be pending

• RAW hazards through memory are maintained

by two restrictions:

1.not allowing a load to initiate the second step of

its execution if any active ROB entry occupied by a

store has a Destination field that matches the

value of the A field of the load, and

2.maintaining the program order for the

computation of an effective address of a load with