Download Solving Data Hazards - High Performance Computing - Lecture Slides and more Slides Computer Science in PDF only on Docsity!
High Performance Computing
Lecture 24
2
Solving Data Hazards
1. Interlocks & stalling dependent instructions
2. Forwarding or Bypassing
3. Load delay slot
4. Instruction Scheduling
Reorder the instructions of the program so that
dependent instructions are far enough apart
This could be done either
by the compiler, before the program runs: Static
Instruction Scheduling
by the hardware, when the program is running:
Dynamic Instruction Scheduling
4
Static Instruction Scheduling
Reorder the instructions of the program to
eliminate data hazards …
or in general to reduce the execution time of the
program
Reordering must be safe
should not change the meaning of the program
Two instructions can be exchanged if they
are independent of each other
5
Example: Static Instruction Scheduling
Program fragment:
LW R3, 0(R1)
ADDI R5, R3, 1
ADD R2, R2, R
LW R13, 0(R11)
ADD R12, R13, R
Scheduling:
1 stall
1 stall
2 stalls
0 stalls
LW R3, 0(R1)
ADDI R5, R3, 1
ADD R2, R2, R
LW R13, 0(R11)
ADD R12, R13, R
7
Kinds of Data Dependence
True dependence
ADD R1, R2, R
SUB R4, R1, R
Anti-dependence
ADD R1, R2, R
SUB R2, R4, R
Output dependence
ADD R1, R2, R
SUB R1, R4, R
8
Dynamic Instruction Scheduling
IF ID^ EX^
MEM WB
IF EX
WB
ID
Instruction Window
Instruction Queue
Functional Units
Floating point Adder
Floating point Multiplier
Integer ALU
Integer Multiplier
Memory Unit
With dynamic instruction scheduling …
10
Problem: Pipeline Hazards
A situation where an instruction cannot
proceed through the pipeline as it should
1. Structural hazard: When 2 or more
instructions in the pipeline need to use the
same resource at the same time
2. Data hazard: When an instruction depends
on the data result of a prior instruction that
is still in the pipeline
3. Control hazard: A hazard that arises due to
control transfer instructions
11
Recall: Execution of Branch Instruction
Mem
PC
Reg
File
Sign
extend
IF ID
4
ALU
Zero?
Mem
EX
MEM WB
13
Control Hazards
Observation: Since the branch is resolved
only in the EX stage, there must be 2 stall
cycles after every conditional branch
instruction
14
Reducing Impact of Branch Stall
The execution of a conditional branch
instruction involves 2 activities
1. evaluating the branch condition (determine
whether it is to be taken or not-taken)
2. computing the branch target address
To reduce branch stall effect we could
evaluate the condition earlier (in ID stage)
compute the target address earlier (in ID stage)
The number of stall cycles would then be
reduced to 1 cycle
16
Prediction and Correctness
Prediction: guessing what is going to happen
What if the guess is incorrect?
The pipelined processor hardware must be built to
detect the misprediction and take appropriate
corrective action
17
Control Hazard Solutions
1. Static Branch Prediction
Example: Static Not-Taken policy
The hardware is built to fetch next from PC + 4
After ID stage, if it is found that the branch
condition is false (i.e., not taken), continue with
the fetched instruction (from PC + 4)
Else, squash the fetched instruction and re-fetch
from the branch target address
squash: cancel, annul the processing of that instruction
19
IF
Static Not-Taken Branch Prediction
BEQZ R3, out
Fetch inst i +
IF ID
IF ID EX MEM WB
Suppose that the
condition evaluates
to TRUE
ID EX MEM
IF ID EX
Fetch inst from
branch target address
etc
SQUASH inst i+
i.e., ONE BRANCH STALL CYCLE
20
Control Hazard Solutions
1. Static Branch Prediction
Example: Static Not-Taken policy
The hardware is built to fetch next from PC + 4
After ID stage, if it is found that the branch
condition is false (i.e., not taken), continue with
the fetched instruction (from PC + 4)
Else, squash the fetched instruction and re-fetch
from the branch target address
Thus, average branch penalty < 1 cycle
0 stall cycles
1 stall cycle