Download Pipelining in Computer Architecture: Control Hazards and Solutions - Prof. Sam Hyuk Noh and more Study notes Computer Science in PDF only on Docsity!
Page 1
Chapter 3
Pipelining
(continued)
Hazards
- Structural Hazard
- Data Hazard
- Control Hazard
Control Hazard
- Caused by PC-changing instructions (Branch, Jump, Call/Return)
For 5-stage pipeline, 3 cycle penalty30% branch frequency. CPI = 1.
Frequency of Instructions
Page 2
Branches Taken Branch Performance
- 67% of conditional branches are taken
- Can compute the forward and backwardbranch frequency
- ā 60% of forward branches are taken85% of backward branches are taken
Solutions to Control Hazard
- Optimized branch processing
- Branch prediction
- Delayed branch
Optimized Branch Processing
1. Find out branch taken or not early **simplified branch condition
- Compute branch target address** early extra hardware
Page 4
Solutions to Control Hazard
- Optimized branch processing
- Branch prediction
- Delayed branch
Delayed Branch
- Semantics of delayed branch ā Branch-delay slot
- Sequential successor instruction
Delayed Branch
(a): Branch must not depend on therescheduled instructions. Always improves performance Revised!!
Delayed Branch
(b): Must be OK to execute rescheduled instructions ifbranch is not taken. May need to duplicate instructions. Improves performance when branch is taken. May enlarge program if instructions are duplicated.
Page 5
Delayed Branch
(c): Must be OK to execute instructions ifbranch is taken. Improves performance when branch is not taken.
Delayed Branch
- Hardware Assists: Cancelling or Nullifying branch ā¢Instruction includes direction of branch prediction ā¢Turn branch-delay slot to no-op if prediction is wrong
Delayed Branch What makes pipelining hard toimplement?
- If you thought it was bad up to here
- Life can get worse!!
Page 7
Saving State
- Force a trap instruction into the pipeline on thenext IF
- Until the trap is taken, turnoff all writes for thefaulting instruction and all instructions that follow in the pipeline. - This prevents any state changes for instruction that will not becompleted before the interrupt is handled
- After the interrupt handling routine gets control,save PC of the faulting instruction
Precise Exceptions
- When the pipeline can be stopped so that theinstructions just before the faulting instruction are completed and those after itcan be restarted from scratch.
- Simplifies the operating system interface.
- Precise exceptions may be supported inhardware or with some software support. In some cases, difficult to implement because instructions maychange the state before they are guaranteed to complete. Leads to decreased instruction parallelism- Performance
Two Modes
- Some high performance machines have twomodes of operation regarding interrupt handling
- ā Precise interrupt modeImprecise interrupt mode Ā» Gives higher performance
- In machines such as Alpha 21064, Power-2, MIPSR8000, precise mode is more than ten times slower!!
Precise Exceptions in DLX
Stage Problem exceptions occurring IF Page fault on instruction fetch,Memory protection violation, Misaligned access ID Undefined or illegal opcode EX Arithmetic exception MEM Page fault on data fetch, Misaligned memory access, Memory protection violation WB None
- ⢠Multiple exceptions may occur in one cycleExceptions may occur out of order
Page 8
Precise Exception Violation
- Interrupts can take place as soon as they occur
- Interrupts occur in order different from the order or theinstructions Instruction Page Fault
Data Fault
Implementing Precise
Exceptions
- Hardware posts each interrupt in a status vector ā Instruction carries the vector in the pipe
- ā Vector is checked when instruction enters the WB stageIf any interrupts are posted they are handled in the time order
- ā All writes are prevented once interrupt flag has been enteredGuarantees that all interrupts of instruction i are seen before any of instruction i+1.
Instruction Set Complications
- When an instruction is guaranteed to complete itis called committed
- In DLX all instructions are committed once theycomplete MEM stage - No instruction updates the state before that stage
- Complications: ā If state is changed in the middle
- If memory is updated in the middle of the execution^ Ā»^ Example - autoincrement instructions
- ā Machines using condition codes set implicitlyMulticycle operations
Multicycle Operations:
Floating Point Operations for DLX
- Some floating point operations may take long time
- Pipeline operations have to be carried out still
- Couple of changes for DLX ā EX cycle may be repeated multiple times
- May have several functional units
Page 10
Hazards and Forwarding
- Divide unit is not pipelined - structural hazard
- As instructions have varying running times,more than one register writes may be required in a cycle
- WAW hazards are possible since instructionsno longer reach WB stage in order
- Instructions complete in different order -exception problems
- Long latency leads to frequent RAW hazards
Precise Exceptions
- Out of order completion causes imprecise exceptionconditions.
- Four approaches ā Ignore the problem
- Buffer the results until all operations issued earlier are done^ »^ Have two modes of execution » Requires large number of comparators and a large MUX ⢠CYBER 180/990 uses a history file which keeps track of the original value of a - registerUse future file which keeps the newer values of a register
- Let exceptions to become somewhat imprecise Ā» Let trap handling routines make the results precise
- Issue only after ascertaining that all earlier instructions will complete
DIVFADDF F0, F2, F4F10, F10, F SUBF F12, F12, F14 Finish before DIVF
Instruction Set Design and Pipelining:
Complications lead to inefficient pipelining
- Variable instruction length and running times ā lead to Ā» Ā» imbalance among stagesComplicate hazard detection and precise exceptions
- Caches have similar effect Ā» machines freeze pipeline
- Complex addressing modes ā Update registers, e.g. autoincrement/decrement
- Allow writes into instruction space ā Self modifying instructions cause pipelining problems
- Implicitly set condition codes
EXAMPLE - MIPS 4000
- 64 bit instruction set ( MIPS-3 instruction set)
- Uses deeper pipeline
- Uses clock rate of 100 - 200 MHz
- Decomposes memory accesses into stages ā Approach is known as SUPERPIPELINING
Page 11
Example - MIPS R
From Hennessey, J.L. and D.A. Patterson, Computer Architecture: A Quantitative Approach, Second Edition,Copyright 1996, Morgan Kaufman Publishers, San Francisco, CA, All rights reserved
Eight Stage pipeline
Pipeline Stages
- IF - First half of Instruction fetch
- IS - Second half of instruction fetch
- RF- Instruction decode and register fetch
- EX- Execution and ā effective address calculation
- ā ALU operationbranch target computation
- condition evaluation
- DF- Data fetch - first half
- DS- Data fetch - second half
- TC- Tag check - determine if data cache access hit
- WB- Write back for loads and register-register operations
MIPS Pipeline with 2 cycle load delay
From Hennessey, J.L. and D.A. Patterson, Computer Architecture: A Quantitative Approach, Second Edition,Copyright 1996, Morgan Kaufman Publishers, San Francisco, CA, All rights reserved
,#)-./$-&#!&.( 10! 24356 78 - 8 !&.(^0!^2
,#)-./$-&#!&.( 10! 24356 78 - 8 !&.(^0!^2
=> ?@ A> B> C> D@ E1 F1
Branch Delay of 3 cycles
From Hennessey, J.L. and D.A. Patterson, Computer Architecture: A Quantitative Approach, Second Edition,Copyright 1996, Morgan Kaufman Publishers, San Francisco, CA, All rights reserved
GGHIJKLMJNOPQOROGSGOVPLTU
WNTXYZOXJQNKLKQYS[ L]_^a bcXcKLKQYS [L] WNTXYZOXJQNKLKQYS\[ L]4^a bcXcKLKQYS [L] WNTXYZOXJQNKLKQYSd[ L]4^`a bcXcKLKQYS [L] WNTXYZOXJQNKLKQYS
ef WNTXYZOXJQNH WNTXYZOXJQNV WNTXYZOXJQNg IcY]LX
[L]_^a bcXcKLKQYS [L] WNTXYZOXJQNKLKQYSd[ L]4^a bcXcKLKQYS
GGg>G Gh@G Gi@G Gj>G Gk>G GlmG Gn1G GHo1G GHH