Pipelining Basics: Understanding MIPS Instruction Cycles and Performance Improvements, Slides of Computer Aided Design (CAD)

An overview of pipelining basics, including the 5 cycles in mips, pipelining history, performance possibilities, pipeline performance, and hazards. It covers the principles of pipelining, the benefits of pipelining, and the limitations and roadblocks to pipelining. The document also includes examples and solutions to control and data hazards.

Typology: Slides

2012/2013

Uploaded on 04/24/2013

baijayanthi
baijayanthi 🇮🇳

4.5

(13)

166 documents

1 / 28

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Pipelining Basics
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c

Partial preview of the text

Download Pipelining Basics: Understanding MIPS Instruction Cycles and Performance Improvements and more Slides Computer Aided Design (CAD) in PDF only on Docsity!

Pipelining Basics

The 5 Cycles in MIPS

  • MIPS steps:

1. Fetch the instruction from RAM

2. Decode and read the regs

3. Execute the operation or calculate the effective address

4. Read/write RAM; store the regs

5. Save a RAM read into regs

  • Pipelining principle: multiple instructions are overlapped in execution

Performance Possibilities

  • Consider 1000 instructions to be pipelined
  • Single cycle machine / non-pipelined
    • CCT = 8 ns due to longest datapath
    • CPI = 1 but 8 ns per instruction
    • 8 ns * 1000 = 8000 ns
  • Multi-cycle machine / pipelined
    • CCT = 2 ns due to longest stage in datapath
    • 5 stages  10 ns per instruction
    • 8 ns + 2 ns * 1000 = 2008 ns
  • Speedup = 8000 / 2008 = 3.98  4

To “fill” the pipeline

Pipeline Performance

  • A single instruction takes more (or the same amount of ) time
  • A group / sequence of instructions takes less time
  • Pipelining increases throughput rather than decreasing execution time for an individual instruction
  • Design principle:
    • Good designs demand good compromises

Hazards

Limits to Pipelined Performance

Roadblocks to Pipelining

  • Structural hazards
    • Multiple instructions vying for a single shared resource
    • Ex: RAM, ALU
    • Instruction!
  • Data hazards
    • Later instruction uses the result of an earlier instruction
    • Ex: lw followed by an add that uses the loaded data
  • Control hazards
    • Fetch of a later instruction relies on the result of an earlier instruction to determine

the correct control path

  • Ex: conditional branches that are taken

More Structural Hazards

  • Which “instruction” is coming from the I-MEM in any given cycle?
    • Need to replicate it!
  • Structural hazards can (usually) be removed by adding duplicate hardware
  • How do I read and write to the register file at the same time?!?

Control & Data Hazards

Solutions

Hazard # 2 - Data Hazards

  • nand cannot read reg 1 until add has stored it
  • Since read/write can occur in the same cycle, must stall 2 cycles here before nand can proceed

add 1 2 3 F^ D^ E^ M^ W

nand 5 1 4 F - - D E M W

More Forwarding / Bypassing

add 1 2 3 F^ D^ E^ M^ W

sw 1 6 100 F^ D^ E^ M^ W

add 1 2 3 F^ D^ E^ M^ W

sw 1 6 100 F^ D^ E^ M^ W

Data Hazards: Load Stalls

  • Cannot forward “back in time” – must permit a “load stall” to wait on the result of the load
  • Forwarding can’t solve everything (unfortunately)

lw 1 6 100 F^ D^ E^ M^ W

add 2 1 3 F^ D^ E^ M^ W

lw 1 6 100 F^ D^ E^ M^ W

add 2 1 3 F^ D^ -^ E^ M^ W

Hazard # 3 - Control Hazards

  • The lw instruction should only complete if the branch fails!

add 4 5 6 F^ D^ E^ M^ W

beq 1 2 loop F D E M W

lw 3 0 300 F D E M W

Control Hazards (2)

  • Stalls are “bubbles” in the pipeline – no useful work is accomplished in a stall
  • The multi-cycle machine “resolves” branches in the E stage
    • Branch resolution could be completed in the D stage if we pass rA and rB thru a special

“subtractor” and bypass the A and B regs

  • Resolving branches in the D stage requires only a single cycle of stalling in the pipeline (vs 2

if we stick to branch resolution in E)

add 4 5 6 F^ D^ E^ M^ W

beq 1 2 loop F D E M W

lw 3 0 300 F D

add 2 5 6 F

next instruction ^ ^ F^ D^ E^ M^ W