



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: Computer Organization; Subject: Computer Science and Engineering ; University: University of Nebraska - Lincoln; Term: Unknown 1989;
Typology: Study notes
1 / 7
This page cannot be seen from the preview
Don't miss anything!




2
3
Our design predicts all branches as being taken Naïve pipeline executes two extra instructions
Naïve pipeline executes three extra instructions
4
PC increment PC increment
CCCC (^) ALUALU
Data memory memory^ Data
Fetch
Decode
Execute
Memory
Write back
Register^ Register^ A^ filefileB^ M E Register^ Register^ A^ filefileB^ M E valP
d_srcA, d_srcB
valA, valB
aluA, aluB
Bch valE
Addr, Data
valM
PC
W_icode, W_valM W_valE, W_valM, W_dstE, W_dstM
icode, ifun, rA, rB, valC
E
M
W
F
D
valP
f_PC
predPC
Instruction memory Instruction memory
M_icode,M_Bch, M_valA
5
e.g., valC passes through decode
E
M
W
F
D
Instruction memory Instruction memory PC increment PC increment
Register file Register file
ALUALU
Data memory Data memory
Select PC
rB
SelectA dstE dstM
ALUA ALUB
control^ Mem. Addr
srcA srcB
read write
ALUfun.
Fetch
Decode
Execute
Memory
Write back icode data out
data in
A B (^) M E
M_valA
W_valM W_valE
M_valA W_valM
d_rvalA
f_PC
Predict PC
valE valM dstE dstM
icode Bch valE valA dstE dstM
icode ifun valC valA valB dstE dstM srcA srcB
icode ifun rA valC valP
predPC
CC CC
d_srcA d_srcB
e_Bch
M_Bch
6
0x000: irmovl $10,%edx
1 2 3 4 5 6 7 8 9 FF DD EE MM WW 0x006: irmovl $3,%eax FF DD EE MM WW 0x00c: nop (^) FF DD EE MM WW 0x00d: nop (^) FF DD EE MM WW 0x00e: addl %edx,%eax FF DD EE MM WW 0x010: halt FF DD EE MM WW
# demo-h2.ys^10
R[%eax] 3
valA R[%edx] = 10 valB R[%eax] = 0
R[%eax] 3
R[%eax] 3
valA R[%edx] = 10 valB R[%eax] = 0
valA R[%edx] = 10 valB R[%eax] = 0
Cycle 6
Error
7
Data Dependencies: No Nop
0x000: irmovl $10,%edx
1 2 3 4 5 6 7 8
F D E M 0x006: irmovl $3,%eax (^) F D E M W
0x00c: addl %edx,%eax F D E M W 0x00e: halt F D E M W
# demo-h0.ys
valA R[%edx] = 0 valB R[%eax] = 0
valA R[%edx] = 0 valB R[%eax] = 0
Cycle 4
Error
M_valE = 10 M_dstE = (^) %edx
e_valE 0 + 3 = 3 E_dstE = %eax
8
Stalling for Data Dependencies
0x000: irmovl $10,%edx
1 2 3 4 5 6 7 8 9
F D E M W 0x006: irmovl $3,%eax (^) F D E M W 0x00c: nop (^) F D E M W
bubble F
0x00e: addl %edx,%eax D D E M W 0x010: halt F D E M W
# demo-h2.ys^10
0x00d: nop F D E M W
11
9
Stall Condition
Source Registers
Destination Registers
Special Case
Indicates absence of register operand
E
M
W
F
D
Instruction memory Instruction memory PC increment PC increment
Register file Register file
ALUALU
Data memory Data memory
SelectPC
rB
Select dstE dstM A
ALU A ALU B
Mem. control Addr
srcA srcB
read write
ALU fun.
Fetch
Decode
Execute
Memory
Write back icode data out
data in
A B (^) M E
M_valA
W_valM W_valE
M_valA W_valM
d_rvalA
f_PC
PredictPC
valE valM dstE dstM
icode Bch valE valA dstE dstM
icode ifun valC valA valB dstE dstM srcA srcB
icode ifun rA valC valP
predPC
CC CC
d_srcA d_srcB
e_Bch
M_Bch
10
Detecting Stall Condition
0x000: irmovl $10,%edx
1 2 3 4 5 6 7 8 9 F D E M W 0x006: irmovl $3,%eax F D E M W 0x00c: nop F D E M W
bubble F
0x00e: addl %edx,%eax (^) D D E M W 0x010: halt (^) F D E M W
# demo-h2.ys^10
0x00d: nop F D E M W
11
Cycle 6 W
W_dstE = %eax W_valE = 3
srcA = %edx srcB = %eax
11
Stalling X
0x000: irmovl $10,%edx
1 2 3 4 5 6 7 8 9
F D E M W 0x006: irmovl $3,%eax (^) F D E M W bubble
bubble
0x00c: addl %edx,%eax (^) D D E M W 0x00e: halt (^) F D E M W
# demo-h0.ys^10
bubble E M W
11
Cycle 4 (^) •
W_dstE = %eax
srcA = %edx srcB = %eax
M_dstE = %eax
srcA = %edx srcB = %eax
E_dstE = %eax D srcA = %edx srcB = %eax
Cycle 5
Cycle 6
12
What Happens When Stalling?
Like dynamically generated nop’s Move through later stages
0x000: irmovl $10,%edx 0x006: irmovl $3,%eax 0x00c: addl %edx,%eax
Cycle 4
0x00e: halt
0x000: irmovl $10,%edx 0x006: irmovl $3,%eax 0x00c: addl %edx,%eax
# demo-h0.ys
0x00e: halt
0x000: irmovl $10,%edx 0x006: irmovl $3,%eax bubble 0x00c: addl %edx,%eax
Cycle 5
0x00e: halt
0x006: irmovl $3,%eax bubble
0x00c: addl %edx,%eax
bubble
Cycle 6
0x00e: halt
bubble bubble
0x00c: addl %edx,%eax
bubble
Cycle 7
0x00e: halt
bubble bubble
Cycle 8
0x00c: addl %edx,%eax 0x00e: halt
Write Back Memory Execute Decode Fetch
19
Implementing
Forwarding
M
D
Register file
Register file
CCCC ALUALU
Data memory
Data memory
rB
dstE dstM
ALU A ALU B
Mem. control
Addr
srcA srcB
read write
ALU fun.
Decode
Execute
Memory
Write back
data out
data in
A B (^) M E
M_valA
W_valE
W_valM W_valE
icode Bch valE valA dstE dstM
E icode ifun valC valA valB dstE dstM srcA srcB
icode ifun rA valC valP
d_srcA d_srcB
e_Bch
M_Bch
Sel+Fwd A Fwd B
W icode valE valM dstE dstM m_valM
W_valM
M_valE
e_valE
20
Implementing Forwarding
Register file Register file
ALUALU
Data memory
Data memory
dstE dstM
ALU B
Addr
srcA srcB
ALU fun.
data out
data in
A B (^) M E
M_valA
W_valE
W_valM W_valE
valE valA dstE dstM
valA valB dstE dstM srcA srcB d_srcA d_srcB
Sel+Fwd A Fwd B
valE valM dstE dstM m_valM
W_valM
e_valE
**## What should be the A value? int new_E_valA = [
D_icode in { ICALL, IJXX } : D_valP;
d_srcA == E_dstE : e_valE;
d_srcA == M_dstM : m_valM;
d_srcA == M_dstE : M_valE;
d_srcA == W_dstM : W_valM;
d_srcA == W_dstE : W_valE;
1 : d_rvalA; ];**
21
Limitation of Forwarding
Load-use dependency
0x000: irmovl $128,%edx
1 2 3 4 5 6 7 8 9 F D E M 0x006: irmovl $3,%ecx (^) F D E M W
W
0x00c: rmmovl %ecx, 0(%edx) F D E M W 0x012: irmovl $10,%ebx F D E M W 0x018: mrmovl 0(%edx), %eax # Load %eax (^) F D E M W
# demo-luh.ys
0x01e: addl %ebx, %eax # Use %eax 0x020: halt
F D E M W F D E M W
10
F D E M W
11
Error
M M_dstM = %eax m_valM M[128] = 3
Cycle 7 Cycle 8
D valA M_valE = 10 valB R[ %eax ] = 0
D valA M_valE = 10 valB R[ %eax ] = 0
M M_dstE = %ebx M_valE = 10
22
Avoiding Load/Use Hazard
0x000: irmovl $128,%edx
1 2 3 4 5 6 7 8 9 F D E M W
F D E M 0x006: irmovl $3,%ecx F D E M W
W F D E M
W
0x00c: rmmovl %ecx, 0(%edx) FF DD EE MM WW 0x012: irmovl $10,%ebx FF DD EE MM WW 0x018: mrmovl 0(%edx), %eax # Load %eax FF DD EE MM WW
# demo-luh.ys
0x01e: addl %ebx, %eax # Use %eax 0x020: halt
F D E M W E M W
10
D D E M W
11
bubble
F D E M W
F F
12
M M_dstM = %eax m_valM M[128] = 3
M M_dstM = %eax m_valM M[128] = 3
Cycle 8
D valA W_valE = 10 valB m_valM = 3
D valA W_valE = 10 valB m_valM = 3
W W_dstE = %ebx W_valE = 10
W W_dstE = %ebx W_valE = 10
23
Detecting Load/Use Hazard
D
Register file Register file
CCCC ALUALU
rB
dstE dstM
ALU A ALU B
srcA srcB
ALU fun.
Decode
Execute
A B (^) M E
W_valM W_valE
E icode ifun valC valA valB dstE dstM srcA srcB
icode ifun rA valC valP
d_srcA d_srcB
e_Bch
Sel +Fwd A Fwd B
e_valE
D
Register file Register file
CCCC ALUALU
rB
dstE dstM
ALU A ALU B
srcA srcB
ALU fun.
Decode
Execute
A B (^) M E
W_valM W_valE
E icode ifun valC valA valB dstE dstM srcA srcB
icode ifun rA valC valP
d_srcA d_srcB
e_Bch
Sel +Fwd A Fwd B
e_valEe_valE
E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB }
Load/Use Hazard
Condition Trigger
24
Control for Load/Use Hazard
0x000: irmovl $128,%edx
1 2 3 4 5 6 7 8 9 F D E M W
F D E M 0x006: irmovl $3,%ecx F D E M W
W F D E M
W
0x00c: rmmovl %ecx, 0(%edx) FF DD EE MM WW 0x012: irmovl $10,%ebx FF DD EE MM WW 0x018: mrmovl 0(%edx), %eax # Load %eax (^) FF DD EE MM WW
# demo-luh.ys
0x01e: addl %ebx, %eax # Use %eax 0x020: halt
F D E M W E M W
10
D D E M W
11
bubble
F D E M W
F F
12
stall
stall
bubble
normal
normal
Load/Use Hazard
Condition
25
Branch Misprediction Example
0x000: xorl %eax,%eax 0x002: jne t # Not taken 0x007: irmovl $1, %eax # Fall through 0x00d: nop 0x00e: nop 0x00f: nop 0x010: halt 0x011: t: irmovl $3, %edx # Target (Should not execute) 0x017: irmovl $4, %ecx # Should not execute 0x01d: irmovl $5, %edx # Should not execute
demo-j.ys
26
Handling Misprediction
Predict branch as taken
Cancel when mispredicted
0x000: xorl %eax,%eax
1 2 3 4 5 6 7 8 9 FF DD EE MM WW 0x002: jne target # Not taken FF DD EE MM WW
# demo-j.ys^10
0x011: t: irmovl $2,%edx # Target bubble 0x017: irmovl $3,%ebx # Target+
bubble 0x007: irmovl $1,%eax # Fall through 0x00d: nop
27
Detecting Mispredicted Branch
Mispredicted Branch E_icode = IJXX & !e_Bch
Condition Trigger
M
CC CC ALUALU ALU A ALU B
ALU fun.
Execute
icode Bch valE valA dstE dstM
E icode ifun valC valA valB dstE dstM srcA srcB
e_Bch e_valE
M
CC CC ALUALU ALU A ALU B
ALU fun.
Execute
icode Bch valE valA dstE dstM
E icode ifun valC valA valB dstE dstM srcA srcB
e_Bch e_valEe_valE
M
CC CC ALUALU ALU A ALU B
ALU fun.
Execute
icode Bch valE valA dstE dstM
E icode ifun valC valA valB dstE dstM srcA srcB
e_Bch e_valEe_valE
M
CC CC ALUALU ALU A ALU B
ALU fun.
Execute
icode Bch valE valA dstE dstM
E icode ifun valC valA valB dstE dstM srcA srcB
e_Bch e_valEe_valEe_valEe_valE
28
Control for Misprediction
0x000: xorl %eax,%eax
1 2 3 4 5 6 7 8 9
FF DD EE MM WW 0x002: jne target # Not taken (^) FF DD EE MM WW
# demo-j.ys^10
0x011: t: irmovl $2,%edx # Target bubble 0x017: irmovl $3,%ebx # Target+
bubble 0x007: irmovl $1,%eax # Fall through 0x00d: nop
normal
bubble
bubble
normal
normal
Mispredicted Branch
Condition
29
0x000: irmovl Stack,%esp # Initialize stack pointer 0x006: call p # Procedure call 0x00b: irmovl $5,%esi # Return point 0x011: halt 0x020: .pos 0x 0x020: p: irmovl $-1,%edi # procedure 0x026: ret 0x027: irmovl $1,%eax # Should not be executed 0x02d: irmovl $2,%ecx # Should not be executed 0x033: irmovl $3,%edx # Should not be executed 0x039: irmovl $4,%ebx # Should not be executed 0x100: .pos 0x 0x100: Stack: # Stack: Stack pointer
Return Example
demo-retb.ys
30
0x026: ret (^) F D E M bubble (^) F D E M W
bubble (^) F D E M W bubble F D E M W 0x00b: irmovl $5,%esi # Return F D E M W
# demo-retb
valC 5 rB %esi
valC 5 rB %esi
valM = 0x0b
valM = 0x0b
Correct Return Example
While in decode, execute, and memory stage
37
Control Combination A
Mispredict
Mispredict
D^ ret
ret 1
D^ ret
ret 1
D^ ret
ret 1
Combination A
Combination stall bubble bubble normal normal
normal
stall
bubble
bubble
bubble
normal
normal
normal
normal
normal
Mispredicted Branch
Processing ret
Condition
F
Instruction memory Instruction memory PC increment PC increment Select PC
Fetch M_valA W_valM
f_PC
Predict PC
predPC
38
Control Combination B
E Load D Use
Load/use
D^ ret
ret 1
D^ ret
ret 1
D^ ret
ret 1
Combination B
stall
stall
stall
bubble + stall
stall
bubble
bubble
bubble
normal
normal
normal
normal
normal
normal
normal
Combination
Load/Use Hazard
Processing ret
Condition
39
Handling Control Combination B
E^ Load D^ Use
Load/use
D^ ret
ret 1
D^ ret
ret 1
D^ ret
ret 1
Combination B
stall
stall
stall
stall
stall
bubble
bubble
bubble
normal
normal
normal
normal
normal
normal
normal
Combination
Load/Use Hazard
Processing ret
Condition
40
Corrected Pipeline Control Logic
stall
stall
stall
stall
stall
bubble
bubble
bubble
normal
normal
normal
normal
normal
normal
normal
Combination
Load/Use Hazard
Processing ret
Condition
**bool D_bubble =
(E_icode == IJXX && !e_Bch) ||
IRET in { D_icode, E_icode, M_icode }** # but not condition for a load/use hazard && !(E_icode in { IMRMOVL, IPOPL } && E_dstM in { d_srcA, d_srcB });
41
Pipeline Summary
Data Hazards
No performance penalty
Control Hazards
Two clock cycles wasted
Three clock cycles wasted
Control Combinations
Only arises with unusual instruction combination