






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The cs433ug midterm exam for computer organization, with questions related to mips pipeline and tomasulo's algorithm. Solutions for questions regarding modified and original mips pipelines, tomasulo's algorithm, and software ilp. Students can use this document as a study resource for understanding mips pipeline structures, tomasulo's algorithm, and software ilp concepts.
Typology: Exams
1 / 10
This page cannot be seen from the preview
Don't miss anything!







Name:
Alias:
Instructions:
Problem No. Maxm Points Points Scored 1 40 2 40 3 40 Total 120
I MIPS Pipeline [40 points]
A. Modified MIPS Pipeline [20 points] We change the MIPS pipeline to have the following structure.
IF instruction fetch ID register read and instruction decode ALU1 first stage of execution. Branch condition is completed in this cycle, as is the address of the branch target; ALU operands are needed at the beginning of this cycle, as is the base register for loads and stores. ALU2 second stage of execution; ALU results available in this cycle; effective address for loads and stores is available MEM1 first cycle of data memory access; only address is needed. MEM2 second cycle of data memory access; store data needed at the beginning of the cycle, load data available in this cycle. WB write back of results
Assume the register file operates on split cycles as discussed in Ap- pendix A, so as to minimize bypassing requirements. a) How many branch delay slots are there? Why? Solution: 2 delay slots. The branch target is available until at the end of the ALU1 stage (which is 2 cycles after the fetch). b) Assuming all possible forwarding is supported, how many stall cycles we have in the following case? (draw a pipeline picture)
ADD R1 R2 R ADD R7 R1 R
Solution: 1 stall. IF ID ALU1 ALU2 MEM1 MEM2 WB IF ID Stall ALU1 ALU2 MEM1 MEM2 WB c) Repeat b) for:
LOAD R1, 10(R5) ADD R7 R1 R
Solution: 3 stalls. IF ID ALU1 ALU2 MEM1 MEM2 WB IF ID Stall Stall Stall ALU1 ALU2 MEM1 MEM2 WB
II Tomasulo’s Algorithm and Speculative Execution [40 points]
Consider the following code fragment LOOP: L.D F0, 0(R1) L.D F2, 8(R1) MUL.D F4, F2, F ADD.D F4, F4, F S.D F4, 0(R2) DADDI R2, R2, # DSUBI R2, R1, # BNEZ R1, LOOP running on a system with the following specifications, noting that the assumptions are the same as in the homework except for those in bold typeface.
Functional Unit Cycles in EX # Functional Units Integer 1 1 FP Add 3 1 FP Multiply 8 1
Table 1: Functional Unit Specification
A. Tomasulo’s Algorithm [20 points]
Complete table ?? using Tomasulo’s algorithm for the given code fragment with no hardware speculation for branches.Include:
Instruction Funct. Unit IS EX WR Comments (if appropriate) L.D F0, 0(R1) Integer 1 2 3 L.D F2, 8(R1) Integer 2 3 4 MUL.D F4, F2, F0 FP Mul 3 5-12 13 RAW F ADD.D F4, F4, F0 FP Add 4 14-16 17 RAW F S.D F4, 0(R2) Integer 5 18 19 (-) RAW F DADDI R2, R2, #8 Integer 6 7 8 DSUBI R2, R1, #16 Integer 7 8 9 BNEZ R1, LOOP Integer 8 9 10 (-) L.D F0, 0(R1) Integer 9 10 11
Table 2: Execution profile using Tomasulo’s Algorithm
III Software ILP [40 points] Consider the following machine.
Now consider this code fragment:
loop L.D F0, 0(R1) L.D F2, 8(R1) ADD.D F4, F2, F MUL.D F6, F0, F ADD.D F6, F6, F S.D F6, 0(R2) DADDUI R1, R1, # DADDUI R2, R2, # DSUBUI R3, R3, # BNEZ R3, loop
A. Loop Unrolling [30 points]
a. [15 points] Reschedule the code to minimize stalls. How many stalls are there? Please show the resulting code. Solution:
loop L.D F0, 0(R1) L.D F2, 8(R1) MUL.D F6, F0, F ADD.D F4, F2, F DADDUI R1, R1, # DADDUI R2, R2, # DSUBUI R3, R3, # 3 STALLS ADD.D F6, F6, F BNEZ R3, loop S.D F6, -8(R2) Answer: 3 stalls
B. Short Answer [10 points]
a. [5 points] How does loop unrolling improve performance? What are 2 disadvantages of loop unrolling? Solution: Increases ILP in each iteration, fewer overheard instructions. Disadvantages: code size increases, register pressure increases b. [5 points] What are 2 differences between dynamically sched- uled superscalar and VLIW processors? Solution: Superscalar - Issues multiple arbitrary instructions, instruc- tions dynamically schedule, if instruction cannot be issued, dont issue VLIW - Issues a fixed number of different types of instructions, instructions packaged together at compile time, if parallel in- structions cannot be found, put NOP in its slot.