Pipeline Hazards and Tomasulo's Algorithm: Homework Solution - Prof. Prabhat Kumar Mishra, Assignments of Electrical and Electronics Engineering

The solution to a homework assignment on pipeline hazards and tomasulo's algorithm. The assignment involves identifying hazards in a given program and using forwarding techniques to avoid them. The solution also covers the use of tomasulo's algorithm with and without speculation, and the impact of the reorder buffer size on processor performance.

Typology: Assignments

Pre 2010

Uploaded on 03/10/2009

koofers-user-ije
koofers-user-ije 🇺🇸

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Homework 2 Solution
CDA 5155: Fall 2008
Due Date: 10/16/2008 11:59 PM (UF EDGE Students: 10/23/2008 11:59 PM)
Brief Solution
You are not allowed to take or give help in completing this assignment. Submit the PDF version of
the submission in e-Learning website before the deadline. Please include the sentence in bold on
top of your submission: I have neither given nor received any unauthorized aid on this
assignment”.
1. [10 points] Identify all possible hazards in the following instructions and show clearly which
instructions are involved in what type of hazard (structural, WAW, WAR, RAW and control)
and why? Is it possible to avoid such hazard using data forwarding? If possible, what
forwarding should be used? Consider a simple 5-stage MIPS pipeline as shown in Figure A.3
for this exercise. I1 is the first instruction and I13 is the last instruction in the program.
Solution:
Since the 5-stage pipeline is in-order issue and in-order execution pipeline, there is no WAR or
WAW hazard.
Possible hazards Possible solution
I 1 LD R1,1024(R4)
I 2 SUBI R5,R5,#4
I 3 ADD R1,R2,R1 RAW: R1 is the result of I1 Forwarding from MEM/WB to ID/EX
I 4 LD R2,1000(R5) RAW: R5 is the result of I2 Forwarding from MEM/WB to ID/EX
I 5 SUB R1,R2,R1 RAW: R1 is the result of I3
RAW: R2 is the result of I4
Forwarding from MEM/WB to ID/EX
Forwarding from MEM/WB to ID/EX
I 6 BGTZ R1, #16 RAW: R1 is the result of I5
Control hazard
Forwarding from EX/MEM to ID/EX
I 7 SUB R1,R0,R1 RAW: R1 is the result of I5
Forwarding from MEM/WB to ID/EX
I 8 SD R1, 1024(R4) RAW: R1 is the result of I7 Forwarding from MEM/WB to EX/MEM
I 9 SUBI R4,R4,#4
I 10 BNE R4, R0, #64 RAW: R4 is the result of I9
Control hazard
Forwarding from EX/MEM to ID/EX
I 11 NOP
I 12 J #800 Control hazard
I 13 BREAK
pf3
pf4
pf5

Partial preview of the text

Download Pipeline Hazards and Tomasulo's Algorithm: Homework Solution - Prof. Prabhat Kumar Mishra and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

Homework 2 Solution

CDA 5155: Fall 2008

Due Date: 10/16/2008 11:59 PM (UF EDGE Students: 10/23/2008 11:59 PM)

Brief Solution

You are not allowed to take or give help in completing this assignment. Submit the PDF version of the submission in e-Learning website before the deadline. Please include the sentence in bold on top of your submission: “ I have neither given nor received any unauthorized aid on this assignment ”.

  1. [10 points] Identify all possible hazards in the following instructions and show clearly which instructions are involved in what type of hazard (structural, WAW, WAR, RAW and control) and why? Is it possible to avoid such hazard using data forwarding? If possible, what forwarding should be used? Consider a simple 5-stage MIPS pipeline as shown in Figure A. for this exercise. I1 is the first instruction and I13 is the last instruction in the program.

Solution: Since the 5-stage pipeline is in-order issue and in-order execution pipeline, there is no WAR or WAW hazard.

Possible hazards Possible solution I 1 LD R1,1024(R4)

I 2 SUBI R5,R5,#

I 3 ADD R1,R2,R1 RAW: R1 is the result of I1 Forwarding from MEM/WB to ID/EX

I 4 LD R2,1000(R5) RAW: R5 is the result of I2 Forwarding from MEM/WB to ID/EX

I 5 SUB R1,R2,R1 RAW: R1 is the result of I RAW: R2 is the result of I

Forwarding from MEM/WB to ID/EX Forwarding from MEM/WB to ID/EX I 6 BGTZ R1, #16 RAW: R1 is the result of I Control hazard

Forwarding from EX/MEM to ID/EX

I 7 SUB R1,R0,R1 RAW: R1 is the result of I5 Forwarding from MEM/WB to ID/EX

I 8 SD R1, 1024(R4) RAW: R1 is the result of I7 Forwarding from MEM/WB to EX/MEM

I 9 SUBI R4,R4,#

I 10 BNE R4, R0, #64 RAW: R4 is the result of I Control hazard

Forwarding from EX/MEM to ID/EX

I 11 NOP

I 12 J #800 Control hazard

I 13 BREAK

  1. [20 points] For the following instructions:

Loop: L.D F0, 1024(R1) MUL.D F4, F2, F S.D F4, 1024(R2) DIV.D F5, F3, F ADD.D F5, F5, F S.D F5, 1024(R1) DSUBIU R2, R2, # DSUBIU R1, R1, # BNEZ R1, Loop

The second column in the following table indicates the number of cycles spent by the instruction (first column) in their respective functional units. Only exception is the load-store instructions where it spends one clock cycle in “Integer ALU” and three clock cycles in load/store (Memory) unit. The third column indicates the number of functional units for different types of instructions.

Instruction Cycle Number of unit Integer ALU 1 2 Load/Store 3 2 ADD.D 3 1 MUL.D 3 1 DIV.D 10 1

Assume that the reservation station and the reorder buffer both have infinite size. The integer ALUs are used for effective address calculation, ALU operations and branch condition evaluation. Assume that you can make at most two writes to CDB in one clock cycle. Complete the following two tables showing when each instruction issues, begins execution, accesses memory and writes its result to the CDB for the first two iterations for the following two scenarios using Tomasulo’s algorithm.

a) Use a MIPS pipeline with two-issue and without speculation. Assume that branches are issued alone (single-issue for that time step) and branch prediction is perfect. To answer this question, use the MIPS pipeline structure in Figure 2.9 in page 94 of the book with modified information based on the abovementioned table.

b) Use a MIPS pipeline with two-issue and with speculation. You also need to specify when each instruction commits. Moreover, assume that up to two instructions of any type can commit per cycle. Furthermore, assume that branches are issued alone (single-issue for that time step) and branch prediction is perfect. Note that stores will spend 3 cycles in the commit stage, because its memory access occurs during commit. To answer this question, use the MIPS pipeline structure in Figure 2.14 in page 107 of the book with modified information based on the abovementioned table.

c) In part b, stores access the memory only during the commit stage. Why is this important for Tomasulo’s algorithm with speculation?

Each wrong cycle calculation violating any structural /data dependences, - Any instruction issued in a wrong cycle, -2. In part a), any instruction executed before the execution of BNEZ in the first loop, - In part b), any instruction committed in a wrong cycle, -

c) This ensures the memory is not updated until the store instruction is no longer speculative. (Note that answers like “to keep loads/stores in order” or “to keep in order completion of instructions” are not accurate enough. The point here is that if some instruction is still speculative, we must be able to roll it back. )

  1. [10 points]

a) Draw the state transition diagram of a 2-bit predictor that can “E-perfectly” predict the following branch sequence:

<NT, NT, NT, T>, <NT, NT, NT, T>, ….. (repeat the 4-element sequence indefinitely).

Recall that you have four possible states for any 2-bit predictor. The initial state of your predictor can be anyone of the four states. By “E-perfectly”, we mean that your predictor must be always correct after a finite number of wrong predictions, regardless of its initial state. (The correct solution is not unique.)

b) Is it possible to build a 2-bit predictor that can “E-perfectly” predict the following branch sequence?

<NT, NT, NT, NT, T>, <NT, NT, NT, NT, T>, ….. (repeat the 5-element sequence).

If possible, draw its state transition diagram. If not, why?

Solution: a)

b) No. It is not possible. If we have such a predictor, it cannot have more than 4 states. Assume after finite number of wrong predictions, it can predict the sequence of NT, NT, NT, NT, T correctly.

NT (00)

T (11)

T

T NT

NT (10)

NT (^) T

NT

T^ NT

NT (01)

Then after its correct prediction of the last NT, its state must go back to some previous state, within which it predicts that the branch will not be taken, because we only have 4 different states and all previous 4 predictions are correct. Therefore its next prediction will not be correct.

  1. [10 points] In this problem, you are going to investigate the impact of the reorder buffer size on the performance of a processor implementing Tomasulo’s algorithm. Assume we have infinite functions units and reservation stations. The branch predictor works perfectly. Infinite number of instructions can be issued and committed per cycle. There are no data dependences in the flow of instruction. a) If our fetch unit can fetch up 8 instructions per cycle. Every instruction takes 1 cycle to execute. What is the average IPC for such a processor? b) Suppose that now every 100th^ instruction is a float division instruction which takes 200 cycles to complete. What is the average IPC? c) Consider the system in part b) except that we only have 30 entries in ROB. What is the average IPC for this processor? If we still want to keep an IPC of 8, what is the minimum size of the ROB? Solution: a) IPC is 8 b) IPC is still 8 c) If we only have 30 entries in ROB, the IPC will be 100/(200+(100-30)/8)=0.48. (The exact IPC may be different if you make some different assumptions about the architecture. But the result should be close to 0.5. The reason is that we just need a little more than 200 cycles to run every 100 instructions.)

To keep an IPC of 8 we need 200*8=1600 entries in ROB.