









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The process of building a pipelined datapath in computer organization, including the need to re-arrange items, add pipeline registers, and handle data hazards through register forwarding and stalls.
Typology: Study notes
1 / 17
This page cannot be seen from the preview
Don't miss anything!










CSc 252 — Computer Organization 1 8 — Pipelining
Read: Chapter 4, Sections 4.5 to 4.8 (4th edition); Chapter 6, Sections 6.1 to 6.5 (3rd edition)
Laundry example: washing (30 minutes), drying (30 minutes), folding (30 minutes), “stashing” (30 minutes).
If only one person’s wash, it takes 2 hours to complete.
If several folks need to do laundry, can do in 2 hours each — sequential solution:
But, the washer, dryer, “folder”, and “stasher” are independent units.
Pipeline basics :
Does not help the latency of a single tasks — still takes 2 hours to do one person’s laundry.
Does help the throughput of the entire work load — 3.5 hours vs. 8 hours.
- Multiple tasks operating simultaneously, each using different resources.
Potential speedup = number of pipe stages.
- Rate limited by slowest pipeline stage.
Unbalanced lengths of pipe stages reduces speedup.
- Time to “fill” pipeline and time to “drain” it reduces speedup.
CSc 252 — Computer Organization 8 — Pipelining
Pipeline basics (continued):
Consider the load word instruction:
lw $s0, 0($t0)
IFetch : Instruction Fetch: get the instruction from memory.
Reg/Dec : Fetch values from Registers and Decode the instruction.
Exec : Execute; calculate the memory address from which to load the word.
Mem : Read the word from Memory.
Write : Write the word to the Register.
3
IFetch Reg/Dec Exec Mem Write
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Pipeline basics (continued):
A more realistic picture: Not all cycles take the same amount of time:
Memory access is slower.
ALU computation is slower.
Register access is faster.
Figure 4.26, page 333 (4th edition) (There is a similar Figure 6.2, page 439 in the 3rd edition):
Instruction class
Instruction
fetch
Register
read
operation
Data
access
Register
write
Total
time
Load word ( lw ) 200 ps 100 ps 200 ps 200 ps 100 ps 800 ps
Store word ( sw ) 200 ps 100 ps 200 ps 200 ps 700 ps
R-format ( add , sub , and , or , slt ) 200 ps 100 ps 200 ps 100 ps 600 ps
Branch ( beq ) 200 ps 100 ps 200 ps 500 ps
CSc 252 — Computer Organization 8 — Pipelining
Building a Pipelined datapath (continued):
Add pipeline registers in-between each pipeline stage.
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Sign
extend
16 32
Shift
left 2
Read
Address
Instruction
[31-0]
Instruction
memory
Add
4
Zero
Result
Address
Write
data
Read
data
Data
memory
u
x
1
0
Sum
Add
Instruction fetch
Instruction decode/
register file read
Execute or
address calculation
Memory
access
Write
back
u
x
0
1
u
x
1
0
7
Building a Pipelined datapath (continued):
How big is each pipeline register? How many bits are in each? (We’ll need more bits before we are done…)
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Sign
extend
32
Shift
left 2
Read
Address
Instruction
[31-0]
Instruction
memory
Add
4
Zero
Result
Address
Write
data
Read
data
Data
memory
u
x
1
0
Sum
Add
Instruction fetch
Instruction decode/
register file read
Execute or
address calculation
Memory
access
Write
back
u
x
0
1
u
x
1
0
CSc 252 — Computer Organization 8 — Pipelining
Building a Pipelined datapath (continued):
Problem:
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Sign
extend
16 32
Instruction decode/
register file read
9
We know the number of the register
to write on the 2nd clock cycle.
But, we do not have the data to
write until the 5th clock cycle.
We need to “remember” the
number of the register until
the 5th clock cycle…
Building a Pipelined datapath (continued):
The write register value is stored in the ID/EX register on cycle 2, then in EX/MEM on cycle 3, then in MEM/
WB on cycle 4. The value is finally used on cycle 5.
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Sign
extend
16 32
Shift
left 2
Read
Address
Instruction
[31-0]
Instruction
memory
Add
4
Zero
Result
Address
Write
data
Read
data
Data
memory
u
x
1
0
Sum
Add
u
x
0
1
u
x
1
0
and MEM/WB are
now larger by
how many bits?
CSc 252 — Computer Organization 8 — Pipelining
Representing Pipelines (continued):
Can help with answering questions such as:
How many clock cycles does it take to execute this code?
What is the ALU doing during clock cycle 4? What else is happening during clock cycle 4?
Can use this representation to help understand datapaths through the CPU.
13
lw $10, 20($1)
sub $11,$2,$
sw $12,28($4)
time flows down:
program execution
order
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Sign
extend
16 32
Shift
left 2
Read
Address
Instruction
[31-0]
Instruction
memory
Add
Instruction
[31-26]
Instruction
[25-21]
Instruction
[20-16]
Instruction
[15-11]
Instruction
[15-0]
M
u
x
0
1
Zero
ALU
result
M
u
x
0
1
Instruction
[5-0]
M
u
x
0
1
Address
Write
data
Read
data
Data
memory
M
u
x
1
0
Sum
Add
RegDst
Branch
MemRead
MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite
M
u
x
1
0
Shift
left 2
PC+4 [31-28]
Instruction
[25-0]
Jump
Pipeline Control :
Control wires are (more or less) the same ones we used before (as per the single-clock cycle implementation).
The lw instruction
uses bits 20-16.
The arithmetic ( add , sub , and , or ,
slt ) instructions use bits 15-11.
CSc 252 — Computer Organization 8 — Pipelining
Pipeline Control (continued):
A specific example: the number of the register to which the result is written.
The lw instruction uses bits 20-16.
The arithmetic ( add , sub , and , or , slt ) instructions use bits 15-11.
We do not know which set of bits to use until the end of the second clock cycle.
Therefore, the control wire cannot be turned on or off until the third clock cycle.
The multiplexor has to be in the third clock cycle portion of the CPU.
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Sign
extend
16 32
Read
Address
Instruction
[31-0]
Instruction
memory
Zero
Result
Address
Write
data
Read
data
Data
memory
u
x
1
0
u
x
1
0
[20-16]
[15-11]
u
x
1
0
15
Pipeline Control (continued):
The second clock cycle has the Control unit. It gets the opcode from wires 31-26.
Control turns the RegDst control wire on or off.
Store the RegDst control wire in the ID/EX pipeline register for use during the 3rd clock cycle.
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Sign
extend
16 32
Read
Address
Instruction
[31-0]
Instruction
memory
Zero
Result
Address
Write
data
Read
data
Data
memory
u
x
1
0
u
x
1
0
[20-16]
[15-11]
u
x
1
0
[31-26]
RegDst
How many bits are now in
the ID/EX pipeline register?
CSc 252 — Computer Organization 8 — Pipelining
Pipeline Control (continued):
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Sign
extend
16 32
Shift
left 2
Read
Address
Instruction
[31-0]
Instruction
memory
Add
4
Zero
Result
Address
Write
data
Read
data
Data
memory
u
x
1
0
Sum
Add
u
x
0
1
u
x
1
0
[20-16]
[15-11]
u
x
1
0
[31-26]
RegDst
ALUOp
ALUSrc
[5-0]
Branch
MemRead
MemWrite
19
How many bits in the
EX/MEM pipeline
register?
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Sign
extend
16 32
Shift
left 2
Read
Address
Instruction
[31-0]
Instruction
memory
Add
4
Zero
Result
Address
Write
data
Read
data
Data
memory
u
x
1
0
Sum
Add
u
x
0
1
u
x
1
0
[20-16]
[15-11]
u
x
1
0
[31-26]
RegDst
ALUOp
ALUSrc
[5-0]
Branch
MemRead
MemWrite
MemtoReg
RegWrite
Pipeline Control (continued):
5th clock cycle control wires:
MemtoReg, RegWrite.
CSc 252 — Computer Organization 8 — Pipelining
Pipeline Control (continued):
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Sign
extend
16 32
Shift
left 2
Read
Address
Instruction
[31-0]
Instruction
memory
Add
4
Zero
Result
Address
Write
data
Read
data
Data
memory
u
x
1
0
Sum
Add
u
x
0
1
u
x
1
0
[20-16]
[15-11]
u
x
1
0
[31-26]
RegDst
ALUOp
ALUSrc
[5-0]
Branch
MemRead
MemWrite
MemtoReg
RegWrite
21
How many bits?
Pipeline Control (continued):
Why not 1 and 2?
X = “does not care”
Taken from the single-clock cycle, but re-arranged to fit the stages for pipelining.
See figure 4.49, page 360 (4th edition); figure 6.28, page 469 (3rd edition).
Instruction
Execution/Address Calculation
3rd Stage
Memory Access
4th Stage
Write-back
5th Stage
RegDst ALUOp1 ALUOp0 ALUSrc Branch MemRead MemWrite RegWrite MemtoReg
R-format 1 1 0 0 0 0 0 1 0
lw 0 0 0 1 0 1 0 1 1
sw X 0 0 1 0 0 1 0 X
beq X 0 1 0 1 0 0 0 X
CSc 252 — Computer Organization 8 — Pipelining
Pipeline Control (continued):
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Sign
extend
16 32
Shift
left 2
Read
Address
Instruction
[31-0]
Instruction
memory
Add
4
Zero
Result
Address
Write
data
Read
data
Data
memory
u
x
1
0
Sum
Add
u
x
0
1
u
x
1
0
[20-16]
[15-11]
u
x
1
0
[31-26]
RegDst
ALUOp
ALUSrc
[5-0]
Branch
MemRead
MemWrite
MemtoReg
RegWrite
25
How many bits?
Data Hazards and Forwarding (continued):
The value we need is present in a pipeline register — Use it instead of waiting.
Register file forwarding to handle read/write to same register:
sub $2, $1, $
and $12,$2,$
or $13,$5,$
add $14,$2,$
sw $15,100($2)
Reg $2 10 10 10 10 10/-20 -20 -20 -20 -
There are three cases to consider:
Value needed is in the register
Value needed is in MEM/WB
Value needed is in EX/MEM
CSc 252 — Computer Organization 8 — Pipelining
Data Hazards and Forwarding
(continued):
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Read
Address
Instruction
[31-0]
Instruction
memory
Zero
Result
Address
Write
data
Read
data
Data
memory
u
x
1
0
[20-16]
[15-11]
u
x
[31-26]
RegDst
ALUOp
ALUSrc
Branch
MemRead
MemWrite
RegWrite
MemtoReg
RegWrite
Forwarding
unit
IF/ID Register Rs
IF/ID Register Rt
Tells the 2 registers needed
for ALU on this clock cycle
27
Tells the 2 registers needed
for ALU on this clock cycle
Tells the register computed
on the previous clock cycle
Tells the register computed
two clock cycles ago
Tells the register computed
on the previous clock cycle
Data Hazards and Forwarding
(continued):
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Read
Address
Instruction
[31-0]
Instruction
memory
Zero
Result
Address
Write
data
Read
data
Data
memory
u
x
1
0
[20-16]
[15-11]
u
x
[31-26]
RegDst
ALUOp
ALUSrc
Branch
MemRead
MemWrite
RegWrite
MemtoReg
RegWrite
Forwarding
unit
IF/ID Register Rs
IF/ID Register Rt
u
x
u
x
3 inputs:
Value fm Registers
Value fm EX/MEM
Value fm MEM/WB 2 control wires carry result
from Forwarding unit
3 inputs:
Value fm Registers
Value fm EX/MEM
Value fm MEM/WB
CSc 252 — Computer Organization 8 — Pipelining
Data Hazards — Stalls (continued):
Stall the pipeline by keeping an instruction in the same stage.
lw $2, 20($1)
and $4, $2, $
or $5, $4, $
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Read
Address
Instruction
[31-0]
Instruction
memory
Zero
Result
Address
Write
data
Read
data
Data
memory
u
x
1
0
[20-16]
[15-11]
u
x
[31-26]
RegDst
ALUOp
ALUSrc
Branch
MemRead
MemWrite
RegWrite
MemtoReg
RegWrite
Forwarding
unit
IF/ID Register Rs
IF/ID Register Rt
u
x
u
x
Hazard
detection
Rt fm ID/EX
31
MemRead, to determine
if memory will be read
Registers needed by
current instruction
Register to be
read fm memory
CSc 252 — Computer Organization 8 — Pipelining
Data Hazards — Stalls (continued):
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Read
Address
Instruction
[31-0]
Instruction
memory
Zero
Result
Address
Write
data
Read
data
Data
memory
u
x
1
0
[20-16]
[15-11]
u
x
[31-26]
RegWrite
Forwarding
unit
IF/ID Register Rs
IF/ID Register Rt
u
x
u
x
Hazard
detection
[25-16]
u
x 0
32
Choose:
Control wire values to be passed to later cycles
Choose:
Change PC for next instruction
Choose:
Write instruction in IF/ID
No stall necessary!
CSc 252 — Computer Organization 8 — Pipelining
Data Hazards — Stalls (continued):
Write
data
Read
register 1
Registers
Read
register 2
Write
register
Read
data 1
Read
data 2
Read
Address
Instruction
[31-0]
Instruction
memory
Zero
Result
Address
Write
data
Read
data
Data
memory
u
x
1
0
[20-16]
[15-11]
u
x
[31-26]
RegWrite
Forwarding
unit
IF/ID Register Rs
IF/ID Register Rt
u
x
u
x
Hazard
detection
[25-16]
u
x 0
33
Choose:
All control wires turned off for stall
Choose:
Leave prev instr in IF/ID
Choose:
PC stays the same to re-read instruction
Stall is needed!