Computer Organization - PIPE-Hardware - Slides | CSCE 230, Study notes of Computer Architecture and Organization

Material Type: Notes; Class: Computer Organization; Subject: Computer Science and Engineering ; University: University of Nebraska - Lincoln; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-28a
koofers-user-28a 🇺🇸

10 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Page 1
Processor Architecture V:
Making the Pipelined
Implementation Work
CSCE 230J
Computer Organization
Dr. Steve Goddard
http://cse.unl.edu/~goddard/Courses/CSCE230J
2
Giving credit where credit is due
Most of slides for this lecture are based on
slides created by Dr. Bryant, Carnegie
Mellon University.
I have modified them and added new
slides.
3
Overview
Make the pipelined processor work!
Data Hazards
Instruction having register R as source follows shortly after
instruction having register R as destination
Common condition, don’t want to slow down pipeline
Control Hazards
Mispredict conditional branch
Our design predicts all branches as being taken
Naïve pipeline executes two extra instructions
Getting return address for ret instruction
Naïve pipeline executes three extra instructions
Making Sure It Really Works
What if multiple special cases happen simultaneously?
4
Pipeline Stages
Fetch
Select current PC
Read instruction
Compute incremented PC
Decode
Read program registers
Execute
Operate ALU
Memory
Read or write data memory
Write Back
Update register file
PC
increment
PC
increment
CC
CC ALU
ALU
Data
memory
Data
memory
Fetch
Decode
Execute
Memory
Write back
Register
file
Register
file
A B M
E
Register
file
Register
file
A B M
E
valP
d_srcA,
d_srcB
valA, valB
aluA, aluB
Bch valE
Addr, Data
valM
PC
W_valE, W_valM, W_dstE, W_dstMW_icode, W_valM
icode, ifun,
rA, rB, valC
E
M
W
F
D
valP
f_PC
predPC
Instruction
memory
Instruction
memory
M_icode,
M_Bch,
M_valA
5
PIPE- Hardware
Pipeline registers hold
intermediate values
from instruction
execution
Forward (Upward) Paths
Values passed from one
stage to next
Cannot jump past
stages
e.g., valC passes
through decode
E
M
W
F
D
Instruction
memory
Instruction
memory PC
increment
PC
increment
Register
file
Register
file
ALU
ALU
Data
memory
Data
memory
Select
PC
rB
dstE dstM
Select
A
ALU
AALU
B
Mem.
control
Addr
srcA srcB
read
write
ALU
fun.
Fetch
Decode
Execute
Memory
Write back
icode
data out
data in
A B M
E
M_valA
W_valM
W_valE
M_valA
W_valM
d_rvalA
f_PC
Predict
PC
valE valM dstE dstM
Bchicode valE valA dstE dstM
icode ifun valC valA valB dstE dstM srcA srcB
valC valPicode ifun rA
predPC
CC
CC
d_srcBd_srcA
e_Bch
M_Bch
6
Data Dependencies: 2 Nop’s
0x000: irmovl $10,%edx
123456789
F D E M WF D E M W
0x006: irmovl $3,%eax F D E M WF D E M W
0x00c: nop F D E M WF D E M W
0x00d: nop F D E M WF D E M W
0x00e: addl %edx,%eax F D E M WF D E M W
0x010: halt F D E M WF D E M W
10
# demo-h2.ys
W
R[%eax]
3
D
valA
R[%edx] = 10
valB
R[%eax] = 0
W
R[%eax]
3
W
R[%eax]
3
D
valA
R[%edx] = 10
valB
R[%eax] = 0
D
valA
R[%edx] = 10
valB
R[%eax] = 0
Cycle 6
Error
pf3
pf4
pf5

Partial preview of the text

Download Computer Organization - PIPE-Hardware - Slides | CSCE 230 and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Processor Architecture V:

Making the Pipelined

Implementation Work

CSCE 230J

Computer Organization

Dr. Steve Goddard

[email protected]

http://cse.unl.edu/~goddard/Courses/CSCE230J

2

Giving credit where credit is due

 Most of slides for this lecture are based on

slides created by Dr. Bryant, Carnegie

Mellon University.

 I have modified them and added new

slides.

3

Overview

Make the pipelined processor work!

Data Hazards

 Instruction having register R as source follows shortly after

instruction having register R as destination

 Common condition, don’t want to slow down pipeline

Control Hazards

 Mispredict conditional branch

 Our design predicts all branches as being taken  Naïve pipeline executes two extra instructions

 Getting return address for ret instruction

 Naïve pipeline executes three extra instructions

Making Sure It Really Works

 What if multiple special cases happen simultaneously?

4

Pipeline Stages

Fetch

 Select current PC

 Read instruction

 Compute incremented PC

Decode

 Read program registers

Execute

 Operate ALU

Memory

 Read or write data memory

Write Back

 Update register file

PC increment PC increment

CCCC (^) ALUALU

Data memory memory^ Data

Fetch

Decode

Execute

Memory

Write back

Register^ Register^ A^ filefileB^ M E Register^ Register^ A^ filefileB^ M E valP

d_srcA, d_srcB

valA, valB

aluA, aluB

Bch valE

Addr, Data

valM

PC

W_icode, W_valM W_valE, W_valM, W_dstE, W_dstM

icode, ifun, rA, rB, valC

E

M

W

F

D

valP

f_PC

predPC

Instruction memory Instruction memory

M_icode,M_Bch, M_valA

5

PIPE- Hardware

 Pipeline registers hold

intermediate values

from instruction

execution

Forward (Upward) Paths

 Values passed from one

stage to next

 Cannot jump past

stages

 e.g., valC passes through decode

E

M

W

F

D

Instruction memory Instruction memory PC increment PC increment

Register file Register file

ALUALU

Data memory Data memory

Select PC

rB

SelectA dstE dstM

ALUA ALUB

control^ Mem. Addr

srcA srcB

read write

ALUfun.

Fetch

Decode

Execute

Memory

Write back icode data out

data in

A B (^) M E

M_valA

W_valM W_valE

M_valA W_valM

d_rvalA

f_PC

Predict PC

valE valM dstE dstM

icode Bch valE valA dstE dstM

icode ifun valC valA valB dstE dstM srcA srcB

icode ifun rA valC valP

predPC

CC CC

d_srcA d_srcB

e_Bch

M_Bch

6

Data Dependencies: 2 Nop’s

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8 9 FF DD EE MM WW 0x006: irmovl $3,%eax FF DD EE MM WW 0x00c: nop (^) FF DD EE MM WW 0x00d: nop (^) FF DD EE MM WW 0x00e: addl %edx,%eax FF DD EE MM WW 0x010: halt FF DD EE MM WW

# demo-h2.ys^10

W

R[%eax] 3

D

valA R[%edx] = 10 valB R[%eax] = 0

W

R[%eax] 3

W

R[%eax] 3

D

valA R[%edx] = 10 valB R[%eax] = 0

D

valA R[%edx] = 10 valB R[%eax] = 0

Cycle 6

Error

7

Data Dependencies: No Nop

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8

F D E M 0x006: irmovl $3,%eax (^) F D E M W

W

0x00c: addl %edx,%eax F D E M W 0x00e: halt F D E M W

# demo-h0.ys

E

D

valA R[%edx] = 0 valB R[%eax] = 0

D

valA R[%edx] = 0 valB R[%eax] = 0

Cycle 4

Error

M

M_valE = 10 M_dstE = (^) %edx

e_valE 0 + 3 = 3 E_dstE = %eax

8

Stalling for Data Dependencies

 If instruction follows too closely after one that writes

register, slow it down

 Hold instruction in decode

 Dynamically inject nop into execute stage

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8 9

F D E M W 0x006: irmovl $3,%eax (^) F D E M W 0x00c: nop (^) F D E M W

bubble F

E M W

0x00e: addl %edx,%eax D D E M W 0x010: halt F D E M W

# demo-h2.ys^10

F

0x00d: nop F D E M W

11

9

Stall Condition

Source Registers

 srcA and srcB of current

instruction in decode

stage

Destination Registers

 dstE and dstM fields

 Instructions in execute,

memory, and write-back

stages

Special Case

 Don’t stall for register ID

 Indicates absence of register operand

E

M

W

F

D

Instruction memory Instruction memory PC increment PC increment

Register file Register file

ALUALU

Data memory Data memory

SelectPC

rB

Select dstE dstM A

ALU A ALU B

Mem. control Addr

srcA srcB

read write

ALU fun.

Fetch

Decode

Execute

Memory

Write back icode data out

data in

A B (^) M E

M_valA

W_valM W_valE

M_valA W_valM

d_rvalA

f_PC

PredictPC

valE valM dstE dstM

icode Bch valE valA dstE dstM

icode ifun valC valA valB dstE dstM srcA srcB

icode ifun rA valC valP

predPC

CC CC

d_srcA d_srcB

e_Bch

M_Bch

10

Detecting Stall Condition

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8 9 F D E M W 0x006: irmovl $3,%eax F D E M W 0x00c: nop F D E M W

bubble F

E M W

0x00e: addl %edx,%eax (^) D D E M W 0x010: halt (^) F D E M W

# demo-h2.ys^10

F

0x00d: nop F D E M W

11

Cycle 6 W

D

W_dstE = %eax W_valE = 3

srcA = %edx srcB = %eax

11

Stalling X

0x000: irmovl $10,%edx

1 2 3 4 5 6 7 8 9

F D E M W 0x006: irmovl $3,%eax (^) F D E M W bubble

F

E M W

bubble

D

E M W

0x00c: addl %edx,%eax (^) D D E M W 0x00e: halt (^) F D E M W

# demo-h0.ys^10

F F

D

F

bubble E M W

11

Cycle 4 (^) •

W

W_dstE = %eax

D

srcA = %edx srcB = %eax

M

M_dstE = %eax

D

srcA = %edx srcB = %eax

E

E_dstE = %eax D srcA = %edx srcB = %eax

Cycle 5

Cycle 6

12

What Happens When Stalling?

 Stalling instruction held back in decode stage

 Following instruction stays in fetch stage

 Bubbles injected into execute stage

 Like dynamically generated nop’s  Move through later stages

0x000: irmovl $10,%edx 0x006: irmovl $3,%eax 0x00c: addl %edx,%eax

Cycle 4

0x00e: halt

0x000: irmovl $10,%edx 0x006: irmovl $3,%eax 0x00c: addl %edx,%eax

# demo-h0.ys

0x00e: halt

0x000: irmovl $10,%edx 0x006: irmovl $3,%eax bubble 0x00c: addl %edx,%eax

Cycle 5

0x00e: halt

0x006: irmovl $3,%eax bubble

0x00c: addl %edx,%eax

bubble

Cycle 6

0x00e: halt

bubble bubble

0x00c: addl %edx,%eax

bubble

Cycle 7

0x00e: halt

bubble bubble

Cycle 8

0x00c: addl %edx,%eax 0x00e: halt

Write Back Memory Execute Decode Fetch

19

Implementing

Forwarding

 Add additional feedback

paths from E, M, and W

pipeline registers into

decode stage

 Create logic blocks to

select from multiple

sources for valA and valB

in decode stage

M

D

Register file

Register file

CCCC ALUALU

Data memory

Data memory

rB

dstE dstM

ALU A ALU B

Mem. control

Addr

srcA srcB

read write

ALU fun.

Decode

Execute

Memory

Write back

data out

data in

A B (^) M E

M_valA

W_valE

W_valM W_valE

icode Bch valE valA dstE dstM

E icode ifun valC valA valB dstE dstM srcA srcB

icode ifun rA valC valP

d_srcA d_srcB

e_Bch

M_Bch

Sel+Fwd A Fwd B

W icode valE valM dstE dstM m_valM

W_valM

M_valE

e_valE

20

Implementing Forwarding

Register file Register file

ALUALU

Data memory

Data memory

dstE dstM

ALU B

Addr

srcA srcB

ALU fun.

data out

data in

A B (^) M E

M_valA

W_valE

W_valM W_valE

valE valA dstE dstM

valA valB dstE dstM srcA srcB d_srcA d_srcB

Sel+Fwd A Fwd B

valE valM dstE dstM m_valM

W_valM

e_valE

**## What should be the A value? int new_E_valA = [

Use incremented PC

D_icode in { ICALL, IJXX } : D_valP;

Forward valE from execute

d_srcA == E_dstE : e_valE;

Forward valM from memory

d_srcA == M_dstM : m_valM;

Forward valE from memory

d_srcA == M_dstE : M_valE;

Forward valM from write back

d_srcA == W_dstM : W_valM;

Forward valE from write back

d_srcA == W_dstE : W_valE;

Use value read from register file

1 : d_rvalA; ];**

21

Limitation of Forwarding

Load-use dependency

 Value needed by end of

decode stage in cycle 7

 Value read from memory in

memory stage of cycle 8

0x000: irmovl $128,%edx

1 2 3 4 5 6 7 8 9 F D E M 0x006: irmovl $3,%ecx (^) F D E M W

W

0x00c: rmmovl %ecx, 0(%edx) F D E M W 0x012: irmovl $10,%ebx F D E M W 0x018: mrmovl 0(%edx), %eax # Load %eax (^) F D E M W

# demo-luh.ys

0x01e: addl %ebx, %eax # Use %eax 0x020: halt

F D E M W F D E M W

10

F D E M W

11

Error

M M_dstM = %eax m_valM M[128] = 3

Cycle 7 Cycle 8

D valA M_valE = 10 valB R[ %eax ] = 0

D valA M_valE = 10 valB R[ %eax ] = 0

M M_dstE = %ebx M_valE = 10

22

Avoiding Load/Use Hazard

 Stall using instruction for

one cycle

 Can then pick up loaded

value by forwarding from

memory stage

0x000: irmovl $128,%edx

1 2 3 4 5 6 7 8 9 F D E M W

F D E M 0x006: irmovl $3,%ecx F D E M W

W F D E M

W

0x00c: rmmovl %ecx, 0(%edx) FF DD EE MM WW 0x012: irmovl $10,%ebx FF DD EE MM WW 0x018: mrmovl 0(%edx), %eax # Load %eax FF DD EE MM WW

# demo-luh.ys

0x01e: addl %ebx, %eax # Use %eax 0x020: halt

F D E M W E M W

10

D D E M W

11

bubble

F D E M W

F F

12

M M_dstM = %eax m_valM M[128] = 3

M M_dstM = %eax m_valM M[128] = 3

Cycle 8

D valA W_valE = 10 valB m_valM = 3

D valA W_valE = 10 valB m_valM = 3

W W_dstE = %ebx W_valE = 10

W W_dstE = %ebx W_valE = 10

23

Detecting Load/Use Hazard

D

Register file Register file

CCCC ALUALU

rB

dstE dstM

ALU A ALU B

srcA srcB

ALU fun.

Decode

Execute

A B (^) M E

W_valM W_valE

E icode ifun valC valA valB dstE dstM srcA srcB

icode ifun rA valC valP

d_srcA d_srcB

e_Bch

Sel +Fwd A Fwd B

e_valE

D

Register file Register file

CCCC ALUALU

rB

dstE dstM

ALU A ALU B

srcA srcB

ALU fun.

Decode

Execute

A B (^) M E

W_valM W_valE

E icode ifun valC valA valB dstE dstM srcA srcB

icode ifun rA valC valP

d_srcA d_srcB

e_Bch

Sel +Fwd A Fwd B

e_valEe_valE

E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }

Load/Use Hazard

Condition Trigger

24

Control for Load/Use Hazard

 Stall instructions in fetch

and decode stages

 Inject bubble into execute

stage

0x000: irmovl $128,%edx

1 2 3 4 5 6 7 8 9 F D E M W

F D E M 0x006: irmovl $3,%ecx F D E M W

W F D E M

W

0x00c: rmmovl %ecx, 0(%edx) FF DD EE MM WW 0x012: irmovl $10,%ebx FF DD EE MM WW 0x018: mrmovl 0(%edx), %eax # Load %eax (^) FF DD EE MM WW

# demo-luh.ys

0x01e: addl %ebx, %eax # Use %eax 0x020: halt

F D E M W E M W

10

D D E M W

11

bubble

F D E M W

F F

12

stall

F

stall

D

bubble

E

normal

M

normal

W

Load/Use Hazard

Condition

25

Branch Misprediction Example

 Should only execute first 8 instructions

0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $3, %edx # Target (Should not execute) 0x017: irmovl $4, %ecx # Should not execute 0x01d: irmovl $5, %edx # Should not execute

demo-j.ys

26

Handling Misprediction

Predict branch as taken

 Fetch 2 instructions at target

Cancel when mispredicted

 Detect branch not-taken in execute stage

 On following cycle, replace instructions in execute and

decode by bubbles

 No side effects have occurred yet

0x000: xorl %eax,%eax

1 2 3 4 5 6 7 8 9 FF DD EE MM WW 0x002: jne target # Not taken FF DD EE MM WW

E M W

# demo-j.ys^10

0x011: t: irmovl $2,%edx # Target bubble 0x017: irmovl $3,%ebx # Target+

F D

E M W

D

F

bubble 0x007: irmovl $1,%eax # Fall through 0x00d: nop

FF DD EE MM WW

FF DD EE MM WW

27

Detecting Mispredicted Branch

Mispredicted Branch E_icode = IJXX & !e_Bch

Condition Trigger

M

CC CC ALUALU ALU A ALU B

ALU fun.

Execute

icode Bch valE valA dstE dstM

E icode ifun valC valA valB dstE dstM srcA srcB

e_Bch e_valE

M

CC CC ALUALU ALU A ALU B

ALU fun.

Execute

icode Bch valE valA dstE dstM

E icode ifun valC valA valB dstE dstM srcA srcB

e_Bch e_valEe_valE

M

CC CC ALUALU ALU A ALU B

ALU fun.

Execute

icode Bch valE valA dstE dstM

E icode ifun valC valA valB dstE dstM srcA srcB

e_Bch e_valEe_valE

M

CC CC ALUALU ALU A ALU B

ALU fun.

Execute

icode Bch valE valA dstE dstM

E icode ifun valC valA valB dstE dstM srcA srcB

e_Bch e_valEe_valEe_valEe_valE

28

Control for Misprediction

0x000: xorl %eax,%eax

1 2 3 4 5 6 7 8 9

FF DD EE MM WW 0x002: jne target # Not taken (^) FF DD EE MM WW

E M W

# demo-j.ys^10

0x011: t: irmovl $2,%edx # Target bubble 0x017: irmovl $3,%ebx # Target+

F D

E M W

D

F

bubble 0x007: irmovl $1,%eax # Fall through 0x00d: nop

FF DD EE MM WW

FF DD EE MM WW

normal

F

bubble

D

bubble

E

normal

M

normal

W

Mispredicted Branch

Condition

29

0x000: irmovl Stack,%esp # Initialize stack pointer 0x006: call p # Procedure call 0x00b: irmovl $5,%esi # Return point 0x011: halt 0x020: .pos 0x 0x020: p: irmovl $-1,%edi # procedure 0x026: ret 0x027: irmovl $1,%eax # Should not be executed 0x02d: irmovl $2,%ecx # Should not be executed 0x033: irmovl $3,%edx # Should not be executed 0x039: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x 0x100: Stack: # Stack: Stack pointer

Return Example

 Previously executed three additional instructions

demo-retb.ys

30

0x026: ret (^) F D E M bubble (^) F D E M W

W

bubble (^) F D E M W bubble F D E M W 0x00b: irmovl $5,%esi # Return F D E M W

# demo-retb

F D E M W

F

valC 5 rB %esi

F

valC 5 rB %esi

W

valM = 0x0b

W

valM = 0x0b

Correct Return Example

 As ret passes through

pipeline, stall at fetch stage

 While in decode, execute, and memory stage

 Inject bubble into decode

stage

 Release stall when reach

write-back stage

37

Control Combination A

 Should handle as mispredicted branch

 Stalls F pipeline register

 But PC selection logic will be using M_valM anyhow

E^ JXX

D

M

Mispredict

E^ JXX

D

M

Mispredict

E

D^ ret

M

ret 1

E

D^ ret

M

ret 1

E

D^ ret

M

ret 1

Combination A

Combination stall bubble bubble normal normal

normal

stall

F

bubble

bubble

D

bubble

normal

E

normal

normal

M

normal

normal

W

Mispredicted Branch

Processing ret

Condition

F

Instruction memory Instruction memory PC increment PC increment Select PC

Fetch M_valA W_valM

f_PC

Predict PC

predPC

38

Control Combination B

 Would attempt to bubble and stall pipeline register D

 Signaled by processor as pipeline error

E Load D Use

M

Load/use

E

D^ ret

M

ret 1

E

D^ ret

M

ret 1

E

D^ ret

M

ret 1

Combination B

stall

stall

stall

F

bubble + stall

stall

bubble

D

bubble

bubble

normal

E

normal

normal

normal

M

normal

normal

normal

W

Combination

Load/Use Hazard

Processing ret

Condition

39

Handling Control Combination B

 Load/use hazard should get priority

 ret instruction should be held in decode stage for additional

cycle

E^ Load D^ Use

M

Load/use

E

D^ ret

M

ret 1

E

D^ ret

M

ret 1

E

D^ ret

M

ret 1

Combination B

stall

stall

stall

F

stall

stall

bubble

D

bubble

bubble

normal

E

normal

normal

normal

M

normal

normal

normal

W

Combination

Load/Use Hazard

Processing ret

Condition

40

Corrected Pipeline Control Logic

 Load/use hazard should get priority

 ret instruction should be held in decode stage for additional

cycle

stall

stall

stall

F

stall

stall

bubble

D

bubble

bubble

normal

E

normal

normal

normal

M

normal

normal

normal

W

Combination

Load/Use Hazard

Processing ret

Condition

**bool D_bubble =

Mispredicted branch

(E_icode == IJXX && !e_Bch) ||

Stalling at fetch while ret passes through pipeline

IRET in { D_icode, E_icode, M_icode }** # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB });

41

Pipeline Summary

Data Hazards

 Most handled by forwarding

 No performance penalty

 Load/use hazard requires one cycle stall

Control Hazards

 Cancel instructions when detect mispredicted branch

 Two clock cycles wasted

 Stall fetch stage while ret passes through pipeline

 Three clock cycles wasted

Control Combinations

 Must analyze carefully

 First version had subtle bug

 Only arises with unusual instruction combination