



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The solution to a homework assignment on pipeline hazards and tomasulo's algorithm. The assignment involves identifying hazards in a given program and using forwarding techniques to avoid them. The solution also covers the use of tomasulo's algorithm with and without speculation, and the impact of the reorder buffer size on processor performance.
Typology: Assignments
1 / 5
This page cannot be seen from the preview
Don't miss anything!




You are not allowed to take or give help in completing this assignment. Submit the PDF version of the submission in e-Learning website before the deadline. Please include the sentence in bold on top of your submission: “ I have neither given nor received any unauthorized aid on this assignment ”.
Solution: Since the 5-stage pipeline is in-order issue and in-order execution pipeline, there is no WAR or WAW hazard.
Possible hazards Possible solution I 1 LD R1,1024(R4)
I 2 SUBI R5,R5,#
I 3 ADD R1,R2,R1 RAW: R1 is the result of I1 Forwarding from MEM/WB to ID/EX
I 4 LD R2,1000(R5) RAW: R5 is the result of I2 Forwarding from MEM/WB to ID/EX
I 5 SUB R1,R2,R1 RAW: R1 is the result of I RAW: R2 is the result of I
Forwarding from MEM/WB to ID/EX Forwarding from MEM/WB to ID/EX I 6 BGTZ R1, #16 RAW: R1 is the result of I Control hazard
Forwarding from EX/MEM to ID/EX
I 7 SUB R1,R0,R1 RAW: R1 is the result of I5 Forwarding from MEM/WB to ID/EX
I 8 SD R1, 1024(R4) RAW: R1 is the result of I7 Forwarding from MEM/WB to EX/MEM
I 9 SUBI R4,R4,#
I 10 BNE R4, R0, #64 RAW: R4 is the result of I Control hazard
Forwarding from EX/MEM to ID/EX
I 11 NOP
I 12 J #800 Control hazard
I 13 BREAK
Loop: L.D F0, 1024(R1) MUL.D F4, F2, F S.D F4, 1024(R2) DIV.D F5, F3, F ADD.D F5, F5, F S.D F5, 1024(R1) DSUBIU R2, R2, # DSUBIU R1, R1, # BNEZ R1, Loop
The second column in the following table indicates the number of cycles spent by the instruction (first column) in their respective functional units. Only exception is the load-store instructions where it spends one clock cycle in “Integer ALU” and three clock cycles in load/store (Memory) unit. The third column indicates the number of functional units for different types of instructions.
Instruction Cycle Number of unit Integer ALU 1 2 Load/Store 3 2 ADD.D 3 1 MUL.D 3 1 DIV.D 10 1
Assume that the reservation station and the reorder buffer both have infinite size. The integer ALUs are used for effective address calculation, ALU operations and branch condition evaluation. Assume that you can make at most two writes to CDB in one clock cycle. Complete the following two tables showing when each instruction issues, begins execution, accesses memory and writes its result to the CDB for the first two iterations for the following two scenarios using Tomasulo’s algorithm.
a) Use a MIPS pipeline with two-issue and without speculation. Assume that branches are issued alone (single-issue for that time step) and branch prediction is perfect. To answer this question, use the MIPS pipeline structure in Figure 2.9 in page 94 of the book with modified information based on the abovementioned table.
b) Use a MIPS pipeline with two-issue and with speculation. You also need to specify when each instruction commits. Moreover, assume that up to two instructions of any type can commit per cycle. Furthermore, assume that branches are issued alone (single-issue for that time step) and branch prediction is perfect. Note that stores will spend 3 cycles in the commit stage, because its memory access occurs during commit. To answer this question, use the MIPS pipeline structure in Figure 2.14 in page 107 of the book with modified information based on the abovementioned table.
c) In part b, stores access the memory only during the commit stage. Why is this important for Tomasulo’s algorithm with speculation?
Each wrong cycle calculation violating any structural /data dependences, - Any instruction issued in a wrong cycle, -2. In part a), any instruction executed before the execution of BNEZ in the first loop, - In part b), any instruction committed in a wrong cycle, -
c) This ensures the memory is not updated until the store instruction is no longer speculative. (Note that answers like “to keep loads/stores in order” or “to keep in order completion of instructions” are not accurate enough. The point here is that if some instruction is still speculative, we must be able to roll it back. )
a) Draw the state transition diagram of a 2-bit predictor that can “E-perfectly” predict the following branch sequence:
<NT, NT, NT, T>, <NT, NT, NT, T>, ….. (repeat the 4-element sequence indefinitely).
Recall that you have four possible states for any 2-bit predictor. The initial state of your predictor can be anyone of the four states. By “E-perfectly”, we mean that your predictor must be always correct after a finite number of wrong predictions, regardless of its initial state. (The correct solution is not unique.)
b) Is it possible to build a 2-bit predictor that can “E-perfectly” predict the following branch sequence?
<NT, NT, NT, NT, T>, <NT, NT, NT, NT, T>, ….. (repeat the 5-element sequence).
If possible, draw its state transition diagram. If not, why?
Solution: a)
b) No. It is not possible. If we have such a predictor, it cannot have more than 4 states. Assume after finite number of wrong predictions, it can predict the sequence of NT, NT, NT, NT, T correctly.
NT (00)
T (11)
T
T NT
NT (10)
NT (^) T
NT
T^ NT
NT (01)
Then after its correct prediction of the last NT, its state must go back to some previous state, within which it predicts that the branch will not be taken, because we only have 4 different states and all previous 4 predictions are correct. Therefore its next prediction will not be correct.
To keep an IPC of 8 we need 200*8=1600 entries in ROB.