
























































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of pipelining in computer systems, focusing on its implementation and the performance issues that arise due to data and control hazards. The concept of pipeline stages, the impact of pipelining on instruction execution time and throughput, and the different types of hazards that can occur. It also discusses methods to handle these hazards, such as pipeline interlocks and compiler scheduling.
Typology: Study notes
1 / 64
This page cannot be seen from the preview
Don't miss anything!

























































Copyright Josep Torrellas 1999, 2001, 2002
Copyright Josep Torrellas 1999, 2001, 2002
Pipelining
Multiple instructions are overlapped in execution
Each is in a different stage
Each stage is called “pipe stage or segment”
Throughput: # inst completed/cycle
Each step takes a machine cycle
Want to balance the work in each stage
Ideally:Time per instruction = Time per inst in a non-pipelined
# pipe stages
Copyright Josep Torrellas 1999, 2001, 2002
Implementation of RISC Instructions
1. Instruction Fetch cycle (IF)
Mem[PC]
; IR holds the instruction
2. Instruction decode/register fetch cycle (ID)
Regs[rs]
; decode the instruction
Regs[rt]
; in the meantime
Imm
sign-extend imm field of IR
;Regs A, B, Imm
; ok if some of this is not needed
Copyright Josep Torrellas 1999, 2001, 2002
3. Execution /Effective address cycle (EX) •
memory ref: ALU output
A+Imm
Reg-Reg (ALU op): ALU output
A op B
Reg-Immed (ALU op): ALU output
A op Imm
Branch: ALU output
NPC+ (Imm << 2)
;address of target
cond
(A op O)
; op = equal,
= not equal
/ note: no instructions need to do 2 of these operations /
/ note: Imm has word count for branches; need to shift*
*by 2 to get bytes to add to PC /
Copyright Josep Torrellas 1999, 2001, 2002
5. Write-back cycle (WB) •
Reg-Reg ALU instr: Regs[rd]
ALU output
Reg-Imm ALU instr: Regs[rt]
ALU output
Load Instruction: Regs[rt]
Now we will try to pipeline itWe need: At the end of each cycle, the data is stored in
some registers (PC,LMD,Imm,A,B,…). This allowsother instructions to execute too.
Branches
4 cycles
Rest of ins
5 cycles
Copyright Josep Torrellas 1999, 2001, 2002
Why does it work?
Use separate I and D caches
Register file can be read/written in 0.5 cycles
PC: incremented in IF
if branch taken, in EX, add PC+ (Imm << 2)
Cannot keep any state in IR
need to move it to
another register every cycle
see picture
These registers IF/ID, ID/EX, EX/MEM, MEM/WBsubsume the temp ones
e.g. Destination Reg in a LD
Copyright Josep Torrellas 1999, 2001, 2002
Control of the pipeline: set the control of the 4 MUXES
(Figure A.18)
ALU stage MUXES: set depending on instructiontype which is set by ID/EX. IR
top one: branch or not
bottom one: reg-reg ALU or other
MUX in IF:
chooses between PC+4 and EX/MEM. ALUOutputcontrolled by EX/MEM.cond
MUX in WB:
controlled by whether inst. is a LD or anALU op
Copyright Josep Torrellas 1999, 2001, 2002
Example
Unpipelined: 10ns cycle time
4 cycles for ALU (40%), branch (20%)5 cycles for mem (40%)
pipelining: adds 1 ns to clockspeedup in execution rate?Unpipelined: avg inst = clock * avg CPI =
10((40%+20%)4 + 40%5) = 44 ns*
pipelined = 11 ns
Speedup= 44/11 = 4
Copyright Josep Torrellas 1999, 2001, 2002
Pipeline Hazards
Situations that prevent the next instruction from
executing its designated clock cycle
Structural: resource conflicts e.g. not enough multipliers
Data: instruction depends on the result of a previous one. e.g. ADD R1, R2, R
Control: results from instructions that change the PC. e.g. BEQ R1, label
As a result, the pipeline may have to stall
Copyright Josep Torrellas 1999, 2001, 2002
Some Combination of inst. Cannot be accomodated because of resourceconflicts
Usually because some functional unit is not pipelined two instructionsusing it cannot proceed back to back
Some resource has not been replicated enough
Eg 1 register file port
Combined I,D memory
Result : Pipeline stall, like if we had inserted a bubble.
Copyright Josep Torrellas 1999, 2001, 2002
Occurs because pipelining changes the order of read/write accesses tooperands
1 ADD
R1, R2, R
2 SUB
R4,R5,R
3 AND
R6,R1,R
4 OR
R8,R1,R
5 XOR
R10,R1,R
Left to their own devices all these instructions produce wrong resultsTo fix problem 4
Split register access W,R
2,3 Forwarding (bypassing or short circuiting).