Download Pipelining in CDA 4150: Advantages, Disadvantages, and Challenges and more Study notes Computer Architecture and Organization in PDF only on Docsity!
CDA 4150 – Pipelining
Single-Cycle implementation has poor performance
- Cycle time longer than necessary for all but slowest instruction
Solution: break the instruction into smaller steps
- Execute each step in one clock cycle
- Cycle time: time it takes to execute the longest step
- Design all the steps to have similar length
Advantages of the multiple cycle processor
- Cycle time is much shorter
- Functional units can be used > once/instruction (less HW)
Disadvantages of the multiple cycle processor
- More timing paths to analyze and tune
- Additional registers to store intermediate data values
CDA 4150 – Pipelining
pc I$ D$
regs
ex et nd
alu
IR
16 32
smd
lmd
Instruction Fetch
(IF)
Decode
(RD) Execute(EX) Memory(MEM)
Writeback
(WB)
CDA 4150 – Pipelining
Time taken for 1 instruction
• Add up the execution times of each phase
• Each phase may take different amounts of time
• One instruction executes at a time
Example
• Pick some execution times out of the air
• tfetch=60ns, tdecode=30ns, texec=50ns, tmem=80ns,
twb=20ns
• Total execution time per instruction = 240ns
CDA 4150 – Pipelining
We can execute multiple instructions at the same time!
Each instruction will be in a different phase of execution
Throughput will increase by the number of pipeline stages
Overlap different steps for consecutive instructions
- Steps are called pipeline stages
- Need latches after each stage to hold control/data for later stages
A new instruction enters the pipeline at IF on each clock
- Takes 5 clocks to complete execution and leave the pipeline
- Potential throughput of 1 CPI
CDA 4150 – Pipelining
pc I$ D$
regs
ex te nd
alu
IR
16 32
smd
lmd
Instruction Fetch
(IF)
Decode
(RD) Execute(EX) Memory(MEM) Writeback(WB)
CDA 4150 – Pipelining
The major hurdle of pipelining
• Situations where next instruction cannot execute
• Reduce the performance of pipelining
Speedup = Pipeline depth/(1 + pipeline stalls/inst)
Want incredibly long pipelines, with no pipeline
stalls
Good luck!
Long pipes increase likelihood of hazards
• Let’s look at pipeline resources used by instruction
class
CDA 4150 – Pipelining
Pipe stage ALU Memory Branch IF Fetch-PCInst Cache Fetch-PCInst Cache Fetch-PCInst Cache
RD Register Read Register Read Register Read EX ALU ALU (address) ALU (dest addr) Compare logicFetch-PC (taken)
MEM N/A Cache Tags N/A Cache Data WB PC PC PC Register Write Register Write (Load)
CDA 4150 – Pipelining
Three classes of hazards
- Data hazards
- One instruction has a source operand that is the result of a previous instruction in the pipeline (Read-After Write: RAW)
- There are other types of data hazards (later)
- Control hazards
- The execution of an instruction depends on the resolution of a previous branch instruction in the pipeline
- Becomes a big problem with deep pipelines
- Structural hazards
- Two or more Instructions in the pipeline require the same hardware resource to progress
- Most common instance is non-pipelined FU (multiplier)
CDA 4150 – Pipelining
pc I$ D$
regs
ex te nd
alu
IR
16 32
smd
lmd
Instruction Fetch
(IF)
Decode
(RD) Execute(EX) Memory(MEM) Writeback(WB)
add r1,r2,r add r4,r1,r
CDA 4150 – Pipelining
pc I$ D$
regs
ex te nd
alu
IR
16 32
smd
lmd
Instruction Fetch
(IF)
Decode
(RD) Execute(EX) Memory(MEM)
Writeback
(WB)
add r1,r2,r add r4,r1,r add r6,r4,r
CDA 4150 – Pipelining
pc I$ D$
regs
ex te nd
alu
IR
16 32
smd
lmd
Instruction Fetch
(IF)
Decode
(RD) Execute(EX) Memory(MEM) Writeback(WB)
add r1,r2,r add r4,r1,r add r6,r4,r …
CDA 4150 – Pipelining
pc I$ D$
regs
ex te nd
alu
IR
16 32
smd
lmd
Instruction Fetch
(IF)
Decode
(RD) Execute(EX) Memory(MEM)
Writeback
(WB)
add r1,r2,r add r4,r1,r add r6,r4,r …
CDA 4150 – Pipelining
pc I$ D$
regs
ex te nd
alu
IR
16 32
smd
lmd
Instruction Fetch
(IF)
Decode
(RD) Execute(EX) Memory(MEM) Writeback(WB)
add r1,r2,r add r4,r1,r add r6,r4,r …
ALU bypass
Mem to ALU bypass
CDA 4150 – Pipelining
When a branch instruction is executed, execution of
subsequent instructions depends on whether the branch
is taken and the location of the destination
A simple, but effective approach is to assume the branch is
not taken and follow the sequential path
The branch is resolved at the end of EX
- If taken, cancel instructions in the sequential path and startfetching from the destination on the next clock
- this results in a 2-clock delay for taken branches
- If not taken, continue sequentially
Instruction I Clock Cycle =>
beq $t0,$t1,L1^1 IF^ RDIF^ EXRD^ MEMEX WBMEM WB I 3 IF RD -- -- L1: II^4 IF^ --^ -- 5 IF^ RD
CDA 4150 – Pipelining
Load Delay
• Explicit 1-instruction delay in MIPS ISA
- If no instruction can be scheduled following the load, nop required - MIPS == “Microprocessor without Interlocked Pipeline Stages
- But other implementations may have different load delays!
Branch Delay
• Explicit 1-instruction delay in MIPS, HP-PA, SPARC
- For MIPS, if no instruction can be scheduled, NOP required
- Scheduled instruction must be safe to execute whether or not
branch is taken (assembler schedules)
- For HP-PA/ SPARC the instruction following the branch is conditionally executed or squashed
CDA 4150 – Pipelining
pc I$ D$
regs
ex te nd
alu
IR
16 32
smd
lmd
Instruction Fetch
(IF)
Decode
(RD) Execute(EX) Memory(MEM)
Writeback
(WB)
CDA 4150 – Pipelining
pc I$ D$
regs
ex te nd
IR
16 32
smd
lmd
Instruction Fetch
(IF)
Decode
(RD) Execute(EX) Memory(MEM) Writeback(WB)
alu
bgez r1,offset nop
CDA 4150 – Pipelining
pc I$ D$
regs
ex te nd
IR
16 32
smd
lmd
Instruction Fetch
(IF)
Decode
(RD) Execute(EX) Memory(MEM)
Writeback
(WB)
alu
bgez r1,offset nop …
CDA 4150 – Pipelining
Non-pipelined, multi-cycle functional units
• Integer multiply, divide
Can also have structural hazards on data cache
• Loads access tags/data in MEM
• Stores access tags in MEM, data in WB
• What if a load follows a store?
Structural hazards are detected in decode and stalled there
Only way to remove them is to add functional units
• Or pipeline them
• Or dual port them (caches)
CDA 4150 – Pipelining
More complicated (deeper) pipelines
Data hazards revisited
Code scheduling for pipelines
What makes pipelining hard
• Interrupts
• Precise exceptions
• Branches and long pipes