






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A midterm exam for a computer science class focusing on cs433. The exam covers topics such as pipelining, control hazards, exceptions, tomasulo's algorithm, and speculative execution. Students are required to answer questions related to instruction execution, functional unit usage, and cycle occupancy in the is, ex, wr, and cmt stages.
Typology: Exams
1 / 10
This page cannot be seen from the preview
Don't miss anything!







Name:
Alias:
Instructions:
Problem No. Maxm Points Points Scored 1 50 2 60 3 40 Total 150
I Pipelining [50 points]
A. Control Hazards [25 points] Suppose we have a MIPS processor with a 1-delay slot for branches. Consider codes (a) through (c):
ADD R1,R2,R3 ADD R1,R2,R3 ADD R1,R2,R NOP NOP NOP BEQZ R4 label BEQZ R1 label BEQZ R1 label [ [ [ ADD R10,R10,R10 ADD R7,R9,R10 ADD R13,R9,R JMP end JMP end JMP end NOP NOP NOP label: ADD R6,R6,R6 label: ADD R7,R11,R12 label: ADD R14,R9,R end: end: end:
(a) (b) (c)
a. [2 points] What is the best instruction to put in the delay slot in code (a)? Explain why. ADD R1,R2,R b. [2 points] What is the best instruction to put in the delay slot in code (b)? Explain why. NOP. Cannot find any c. [7 points] What is the best instruction to put in the delay slot in code (c) if R2+R3=0 60% of the time? Show the re- sulting code. In this case, what are the instructions executed when R2+R3=0, and what are the instructions executed when R2+R3!=0? ADD R1,R2,R NOP BEQ R1 end ADD R14,R9,R ADD R13,R9,R JMP end NOP label: ADD R14,R9,R end:
case 1) inst # 1, 2, 3, 4 case 2) inst # 1, 2, 3, 4, 5, 6, 7
d. [7 points] Repeat the whole question c if R2+R3=0 40% of the time.
B. Exceptions [25 points]
a. [5 points] What does it mean that a pipeline supports precise exceptions?
b. [4 points] How does a pipeline support precise exceptions?
c. [4 points] List one good thing and one bad thing of precise exceptions?
d. [4 points] What is a statically and a dynamically scheduled machine?
e. [4 points] What is a superscalar and a VLIW? Are they static or dynamic machines?
f. [4 points] How is the reorder buffer related to exception han- dling?
II Tomasulo’s Algorithm and Speculative Execution [60 points]
Consider the following code fragment DADDI R1, R0, # LOOP: L.D F0, 0(R2) MUL.D F2, 0(R1) DADDI F4, F2, F S.D F6, 0(R2) ADD.D F2, F2, F DSUBI F6, F4, F BNEZ R1, LOOP running on a system with the following specifications, noting that the assumptions are the same as in the homework except for those in bold typeface.
Functional Unit Cycles in EX # Functional Units Integer 1 1 FP add 3 1 FP multiply 8 1
Table 1: Functional Unit Specification
B. Tomasulo’s Algorithm with Speculative Execution [30 points]
Now, assume the architecture above, except with hardware specu- lation. Assume that the reorder buffer has four entries, named 0, 1, 2, and 3. Only one ROB entry can commit per cycle. Complete table 3, including:
Instruction ROB IS EX WR CMT Comments (if appropriate) DADDI R1, R0, #2 0 1 2 3 4 L.D F0, 0(R2) 1 2 3 4 5 MUL.D F2, F0, F2 2 3 5-12 13 14 RAW (L.D F0) DADDI R2, R2, #32 3 4 5 6 15 in-order CMT S.D F2, 0(R2) 0 5 14 - 16 RAW (MUL.D F2), in-order CMT ADD.D F4, F2, F4 1 6 14-16 17 18 RAW (MUL.D F2) DSUBI R1, R1, #1 2 15 16-17 18 19 No ROB until 15, CDB conflict BNEZ R1, LOOP 3 16 19 - 20 RAW (DSUBI R1) L.D F0, 0(R2) 0 17 18 19 21 in-order CMT MUL.D F2, F0, F2 1 19 20-27 28 29 No ROB until 19 DADDI R2, R2, #32 2 20 21 22 30 in-order CMT
Table 3: Tomasulo’s Algorithm with Speculative Excecution
III [40 points] Software ILP Consider the following machine.
Now consider this code fragment:
loop L.D F2, 0(R1) L.D F4, 0(R2) DADDUI R1, R1, # DADDUI R2, R2, # ADD.D F6, F2, F MUL.D F8, F4, F ADD.D F10, F6, F S.D F10, 0(R3) DADDUI R3, R3, # DSUBUI R4, R4, # BNEZ R4, loop
B. Software Pipelining [12 points] Software pipeline the loop and reorder the instructions to reduce stalls. Don’t write the startup or cleanup code.
loop S.D F10, -24(R3) //x- ADD.D F10, F6, F8 //x- ADD.D F6, F2, F2 //x- MUL.D F8, F4, F4 //x- L.D F2, 0(R1) //x L.D F4, 0(R2) //x DADDUI R1, R1, # DSUBUI R3, R3, # DADDUI R2, R2, # BNEZ R4, loop DADDUI R4, R4, #
C. Short Answer [8 points]
a. [4 points] Name one advantage and disadvantage of Loop Un- rolling.
Advantages: more ILP, fewer overheard instructions Disadvantages: code size increases, register pressure increases, problem becomes worse in multiple issue processors b. [4 points] Name one advantage of using VLIW and one prob- lem with the original VLIW model.
Advantages: keep more FU’s busy by issuing multiple instruc- tions, simpler hardware Problems: code size increase, limitations of lockstep operation, binary compatibility, finding parallelism