Pipelining in CDA 4150: Advantages, Disadvantages, and Challenges, Study notes of Computer Architecture and Organization

Pipelining in cda 4150, a microprocessor, and its advantages such as shorter cycle time and functional units being reused. However, it also has disadvantages like more timing paths to analyze and additional registers required. The concept of pipeline stages, execution time for each instruction, and the potential throughput increase. It also covers the major hurdles of pipelining, including data, control, and structural hazards.

Typology: Study notes

Pre 2010

Uploaded on 11/08/2009

koofers-user-25e
koofers-user-25e 🇺🇸

10 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
CDA 4150 – Pipelining
   
Single-Cycle implementation has poor performance
Cycle time longer than necessary for all but slowest instruction
Solution: break the instruction into smaller steps
Execute each step in one clock cycle
Cycle time: time it takes to execute the longest step
Design all the steps to have similar length
Advantages of the multiple cycle processor
Cycle time is much shorter
Functional units can be used > once/instruction (less HW)
Disadvantages of the multiple cycle processor
More timing paths to analyze and tune
Additional registers to store intermediate data values
CDA 4150 – Pipelining
   
pc
I$
D$
regs
e
x
t
e
n
d
alu
+
4
IR
16 32
=?
smd
lmd
Instruction Fetch
(IF) Decode
(RD) Execute
(EX) Memory
(MEM) Writeback
(WB)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Pipelining in CDA 4150: Advantages, Disadvantages, and Challenges and more Study notes Computer Architecture and Organization in PDF only on Docsity!

CDA 4150 – Pipelining

Single-Cycle implementation has poor performance

  • Cycle time longer than necessary for all but slowest instruction

Solution: break the instruction into smaller steps

  • Execute each step in one clock cycle
  • Cycle time: time it takes to execute the longest step
  • Design all the steps to have similar length

Advantages of the multiple cycle processor

  • Cycle time is much shorter
  • Functional units can be used > once/instruction (less HW)

Disadvantages of the multiple cycle processor

  • More timing paths to analyze and tune
  • Additional registers to store intermediate data values

CDA 4150 – Pipelining

pc I$ D$

regs

ex et nd

alu

IR

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM)

Writeback

(WB)

CDA 4150 – Pipelining

Time taken for 1 instruction

• Add up the execution times of each phase

• Each phase may take different amounts of time

• One instruction executes at a time

Example

• Pick some execution times out of the air

• tfetch=60ns, tdecode=30ns, texec=50ns, tmem=80ns,

twb=20ns

• Total execution time per instruction = 240ns

CDA 4150 – Pipelining

We can execute multiple instructions at the same time!

Each instruction will be in a different phase of execution

Throughput will increase by the number of pipeline stages

Overlap different steps for consecutive instructions

  • Steps are called pipeline stages
  • Need latches after each stage to hold control/data for later stages

A new instruction enters the pipeline at IF on each clock

  • Takes 5 clocks to complete execution and leave the pipeline
  • Potential throughput of 1 CPI

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

alu

IR

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM) Writeback(WB)

CDA 4150 – Pipelining

The major hurdle of pipelining

• Situations where next instruction cannot execute

• Reduce the performance of pipelining

Speedup = Pipeline depth/(1 + pipeline stalls/inst)

Want incredibly long pipelines, with no pipeline

stalls

Good luck!

Long pipes increase likelihood of hazards

• Let’s look at pipeline resources used by instruction

class

CDA 4150 – Pipelining

Pipe stage ALU Memory Branch IF Fetch-PCInst Cache Fetch-PCInst Cache Fetch-PCInst Cache

RD Register Read Register Read Register Read EX ALU ALU (address) ALU (dest addr) Compare logicFetch-PC (taken)

MEM N/A Cache Tags N/A Cache Data WB PC PC PC Register Write Register Write (Load)

CDA 4150 – Pipelining

Three classes of hazards

  • Data hazards
    • One instruction has a source operand that is the result of a previous instruction in the pipeline (Read-After Write: RAW)
    • There are other types of data hazards (later)
  • Control hazards
    • The execution of an instruction depends on the resolution of a previous branch instruction in the pipeline
    • Becomes a big problem with deep pipelines
  • Structural hazards
    • Two or more Instructions in the pipeline require the same hardware resource to progress
    • Most common instance is non-pipelined FU (multiplier)

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

alu

IR

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM) Writeback(WB)

add r1,r2,r add r4,r1,r

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

alu

IR

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM)

Writeback

(WB)

add r1,r2,r add r4,r1,r add r6,r4,r

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

alu

IR

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM) Writeback(WB)

add r1,r2,r add r4,r1,r add r6,r4,r …

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

alu

IR

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM)

Writeback

(WB)

add r1,r2,r add r4,r1,r add r6,r4,r …

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

alu

IR

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM) Writeback(WB)

add r1,r2,r add r4,r1,r add r6,r4,r …

ALU bypass

Mem to ALU bypass

CDA 4150 – Pipelining

When a branch instruction is executed, execution of

subsequent instructions depends on whether the branch

is taken and the location of the destination

A simple, but effective approach is to assume the branch is

not taken and follow the sequential path

The branch is resolved at the end of EX

  • If taken, cancel instructions in the sequential path and startfetching from the destination on the next clock
    • this results in a 2-clock delay for taken branches
  • If not taken, continue sequentially

Instruction I Clock Cycle =>

beq $t0,$t1,L1^1 IF^ RDIF^ EXRD^ MEMEX WBMEM WB I 3 IF RD -- -- L1: II^4 IF^ --^ -- 5 IF^ RD

CDA 4150 – Pipelining

Load Delay

• Explicit 1-instruction delay in MIPS ISA

  • If no instruction can be scheduled following the load, nop required - MIPS == “Microprocessor without Interlocked Pipeline Stages
  • But other implementations may have different load delays!

Branch Delay

• Explicit 1-instruction delay in MIPS, HP-PA, SPARC

  • For MIPS, if no instruction can be scheduled, NOP required
    • Scheduled instruction must be safe to execute whether or not

branch is taken (assembler schedules)

  • For HP-PA/ SPARC the instruction following the branch is conditionally executed or squashed

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

alu

IR

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM)

Writeback

(WB)

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

IR

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM) Writeback(WB)

alu

bgez r1,offset nop

CDA 4150 – Pipelining

pc I$ D$

regs

ex te nd

IR

16 32

smd

lmd

Instruction Fetch

(IF)

Decode

(RD) Execute(EX) Memory(MEM)

Writeback

(WB)

alu

bgez r1,offset nop …

CDA 4150 – Pipelining

Non-pipelined, multi-cycle functional units

• Integer multiply, divide

Can also have structural hazards on data cache

• Loads access tags/data in MEM

• Stores access tags in MEM, data in WB

• What if a load follows a store?

Structural hazards are detected in decode and stalled there

Only way to remove them is to add functional units

• Or pipeline them

• Or dual port them (caches)

CDA 4150 – Pipelining

More complicated (deeper) pipelines

Data hazards revisited

Code scheduling for pipelines

What makes pipelining hard

• Interrupts

• Precise exceptions

• Branches and long pipes