CSE 311 Computer Organization: Enhancing Performance with Pipelining, Slides of Computer Architecture and Organization

Computer Organization MIPS pipelining

Typology: Slides

2020/2021

Uploaded on 04/14/2021

unknown user
unknown user 🇪🇬

5 documents

1 / 32

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
CSE 311
Computer Organization
Mostafa I. Soliman
Professor of Computer Engineering
CSE Department
Lecture 10
Enhancing Performance
with Pipelining
Lecture 10
Enhancing Performance
with Pipelining
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20

Partial preview of the text

Download CSE 311 Computer Organization: Enhancing Performance with Pipelining and more Slides Computer Architecture and Organization in PDF only on Docsity!

CSE 311

Computer Organization

Mostafa I. Soliman

Professor of Computer Engineering

CSE Department

[email protected]

[email protected]

Lecture 10

Enhancing Performance

with Pipelining

Lecture 10

Enhancing Performance

with Pipelining

Enhancing Performance with

Pipelining

• Introduction to Pipelining

• Pipelined vs. Single-Cycle Instruction Execution

• Pipelining MIPS

• What Makes Pipelining Hard?

- Structural Hazards

- Control Hazards

- Data Hazards

• Put All Together:

Pipelined Datapath

You can often find in

rivers what you cannot

find in oceans.

Indian proverb

Pipelined vs. Single-Cycle

Instruction Execution: the Plan

Instruction

fetch

Reg ALU

Data

access

Reg

8 ns

Instruction

fetch

Reg ALU

Data

access

Reg

8 ns

Instruction

fetch

8 ns

Time

lw $ 1 , 100 ($ 0 )

lw $ 2 , 200 ($ 0 )

lw $ 3 , 300 ($ 0 )

Program

execution

order

(in instructions)

Instruction

fetch

Reg ALU

Data

access

Reg

Time

lw $ 1 , 100 ($ 0 )

lw $ 2 , 200 ($ 0 )

lw $ 3 , 300 ($ 0 )

2 ns

Instruction

fetch

Reg ALU

Data

access

Reg

2 ns

Instruction

fetch

Reg ALU

Data

access

Reg

2 ns 2 ns 2 ns 2 ns 2 ns

Program

execution

order

(in instructions)

Single-cycle

Pipelined

Assume 2 ns for memory access, ALU operation; 1 ns for register access:

therefore, single cycle clock 8 ns; pipelined clock cycle 2 ns.

Pipelining: Keep in Mind

  • Pipelining does not reduce latency of a single task, it

increases throughput of entire workload

  • Pipeline rate limited by longest stage

potential speedup = number pipe stages

  • unbalanced lengths of pipe stages reduces speedup
  • Time to fill pipeline and time to drain it – when there is slack

in the pipeline – reduces speedup

Pipelining MIPS

  • What makes it hard?

     structural hazards: different instructions, at 

different stages, in the pipeline want to use the same

hardware resource

  • control hazards: succeeding instruction, to put

into pipeline, depends on the outcome of a previous

branch instruction, already in pipeline

  • data hazards: an instruction in the pipeline requires

data to be computed by a previous instruction still in

the pipeline

  • Before actually building the pipelined datapath and

control we first briefly examine these potential hazards

individually…

Structural Hazards

Structural hazard : inadequate hardware to simultaneously

support all instructions in the pipeline in the same clock cycle

  • E.g., suppose single – not separate – instruction and data memory

in pipeline below with one read port

  • then a structural hazard between first and fourth lw instructions

MIPS was designed to be pipelined : structural hazards are easy to

avoid!

Instruction

fetch

Reg ALU

Data

access

Reg

Time

lw $ 1 , 100 ($ 0 )

lw $ 2 , 200 ($ 0 )

lw $ 3 , 300 ($ 0 )

2 ns

Instruction

fetch

Reg ALU

Data

access

Reg

2 ns

Instruction

fetch

Reg ALU

Data

access

Reg

2 ns 2 ns 2 ns 2 ns 2 ns

Program

execution

order

(in instructions)

Pipelined

Instruction

fetch

Reg ALU

Data

access

Reg

2 ns

lw $ 4 , 400 ($ 0 )

Hazard if single memory

Control Hazards

  • Solution 2 Predict branch outcome - e.g., predict branch-not-taken :

Instruction

fetch

Reg ALU

Data

access

Reg

Time

beq $ 1 , $ 2 , 40

add $ 4 , $ 5 , $ 6

lw $ 3 , 300 ($ 0 )

Instruction

fetch

Reg ALU

Data

access

Reg

2 ns

Instruction

fetch

Reg ALU

Data

access

Reg

2 ns

Program

execution

order

(in instructions)

Instruction

fetch

Reg ALU

Data

access

Reg

Time

beq $ 1 , $ 2 , 40

add $ 4 , $ 5 ,$ 6

or $ 7 , $ 8 , $ 9

Instruction

fetch

Reg ALU

Data

access

Reg

2 4 6 8 10 12 14

2 4 6 8 10 12 14

Instruction

fetch

Reg ALU

Data

access

Reg

2 ns

4 ns

bubble bubble

bubble bubble bubble

Program

execution

order

(in instructions)

Prediction success

Prediction failure: undo )=flush( lw

Control Hazards

  • Solution 3 Delayed branch: always execute the sequentially

next statement with the branch executing after one

instruction delay – compiler’s job to find a statement that

can be put in the slot that is independent of branch

outcome

  • MIPS does this – but it is an option in SPIM )Simulator ->

Settings(

Instruction

fetch

Reg ALU

Data

access

Reg

Tim e

be q $ 1 , $ 2 , 4 0

a dd $ 4 , $ 5 , $ 6

lw $ 3 , 3 0 0 ($ 0 )

Instruction

fetch

Reg ALU

Data

access

Reg

2 ns

Instruction

fetch

Reg ALU

Data

access

Reg

2 ns

2 n s

(d ela ye d bra nch slot)

Pro gra m

e xe cution

orde r

(in instructio ns)

Delayed branch beq is followed by add that is

independent of branch outcome

Data Hazards

  • Forwarding may not be enough - e.g., if an R-type instruction following a load uses the result of the load - called load-use data hazard

Time

lw $s 0 , 20 ($t 1 )

sub $t 2 , $s 0 , $t 3

Program

execution

order

(in instructions)

IF ID

MEM WB

EX

IF ID WB

EX MEM

Time

lw $s 0 , 20 ($t 1 )

sub $t 2 , $s 0 , $t 3

Program

execution

order

(in instructions)

IF ID

EX MEM WB

IF ID WB

EX MEM

bubble bubble bubble bubble bubble

With a one-stage stall,

forwarding

can get the data to the sub

instruction in time

Without a stall it is

impossible

to provide input to the sub

instruction in time

Reordering Code to Avoid

Pipeline Stall )Software Solution(

  • Example:

lw $t0, 0($t1)

lw $t2, 4($t1)

sw $t2, 0($t1)

sw $t0, 4($t1)

  • Reordered code:

lw $t0, 0($t1)

lw $t2, 4($t1)

sw $t0, 4($t1)

sw $t2, 0($t1)

Data hazard

Interchanged

Review - Single-Cycle Datapath “Steps”

5 5

16

RD

RD

RN1 RN2 WN

WD

Register File ALU

E

X

T

N

D

16 32

RD

WD

Data

Memory

ADDR

5

Instruction I

32

M

U

X

RD

Instruction

Memory

ADDR

PC

ADD

ADD

M

U

X

32

IF

Instruction Fetch

ID

Instruction Decode

EX

. Execute/ Address Calc

MEM

Memory Access

WB

Write Back

Zero

Pipelined Datapath – Key Idea

  • What happens if we break the execution into multiple cycles, but keep the

extra hardware?

  • Answer: We may be able to start executing a new instruction at each clock

cycle - pipelining

  • …but we shall need extra registers to hold data between cycles
    • pipeline registers

Pipelined Datapath

IF/ID

Pipeline registers

5 5

16

RD

RD

RN1 RN2 WN

WD

Register File ALU

E

X

T

N

D

16 32

RD

WD

Data

Memory

ADDR

5

Instruction I

32

M

U

X

RD

Instruction

Memory

ADDR

PC

ADD

ADD

M

U

X

32

ID/EX EX/MEM MEM/WB

Zero

bits 64

bits 97 bits 64

bits 128

wide enough to hold data coming in

? Only data flowing right to left may cause hazard…, why

Bug in the Datapath

5 5

16

RD

RD

RN1 RN2 WN

WD

Register File ALU

E

X

T

N

D

16 32

RD

WD

Data

Memory

ADDR

5

Instruction I

32

M

U

X

RD

Instruction

Memory

ADDR

PC

ADD

ADD

M

U

X

32

! Write register number comes from another later instruction

IF/ID ID/EX EX/MEM

MEM/WB