




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Exam; Professor: Torrellas; Class: Computer System Organization; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2004;
Typology: Exams
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Name:
Instructions:
Problem No. Maxm Points Points Scored
1 25
2 25
3 25
Total 100
Consider code that executes branches Br1 and Br2 with the following order and outcomes:
Outcome T T T NT T T T T NT NT
a) [7 points] Assume that 1 bit branch predictors are used. When the processor starts to execute, both predictors contain value NT (Not Taken). What is the number of correct predictions? Use the following table to record the prediction of each branch.
Step Br1 Predict. Br1 Action Br2 Predict. Br2 Action
1 NT T NT T
2 T T T NT
3 T T
4 T NT
5 NT T
6 T T
7 T T
8 T NT
Number of correct predictions: 4 Number of correct predictions: 0
b) [7 Points] Now assume that 4 bit saturation counters are used. When the processor starts to execute, both counters contain value 7. What is the number of correct predictions? Use the following table to record the prediction of each branch.
Step Br1 Predict. Br1 Action Br2 Predict. Br2 Action
1 NT T NT T
2 T T T NT
3 T T
4 T NT
5 T T
6 T T
7 T T
8 T NT
Number of correct predictions: 5 Number of correct predictions: 0
Suppose we have a single issue processor using Tomasulo’s algorithm with hardware speculative execution.
FU type Time in EX Num of FUs Num of Resv. Stations FP Add/Sub 4 1 2 FP Mul 7 2 3 Load/Store 1 1 2 Integer/Branch ops 1 1 2
Make the following assumptions:
Fill in the following table. Enter in each of IS, EX, WR, and Commit the cycles in which the instruc- tion occupies each stage. Leave stall cycles out from the table. The first line is filled in for you.
Instruction Issue Execute Writeback Commit Reason for stalling
LD F0, 0(R1) 1 2 3 4
LD F2, 8(R1) 2 3 4 5
LD F4, 16(R1) 4 5 6 7 struc hazard for load RS
MUL.D F6, F0, F4 5 7-13 14 15 RAW on F
ADD.D F8, F0, F2 6 7-10 11 16
ADD.D F10, F6, F8 7 15-18 19 20 RAW on F
ADD.D F12, F10, F4 12 20-23 24 25 struc hazard for ADD RS, RAW for F
SD F12, 0(R1) 13 25 26 RAW on F
BNEZ R2, loop 17 18 29 struc hazard for Int RS
a) [7 points] What is a multiple issue processor? Define the two types of multiple issue processors, and give two advantages of each.
Solution: A multiple issue processor can issue and commit more than one instruction per cycle, potentially obtaining CPI’s less than 1. The two types are VLIW and superscalar. VLIW processors don’t require the complicated hazard checking hardware that superscalars do. Po- tentially shorter cycle times/instruction decode stages can result from not checking for dependencies Superscalar processors don’t require code to be recompiled for each different version of the hardware, and can better handle dependencies that can not be determined at compile time.
b) [6 points] What is dynamic and static scheduling? Give an example that shows when dynamic scheduling is useful. For the machines listed above, are they dynamic and/or static?
Solution: Dynamically scheduling is when the code is scheduled at run time by the processor, stalling as neces- sary to resolve hazards. Statically scheduled code is scheduled by the compiler and executed in order. VLIW: static Superscalar: dynamic or static.
Assume a single-issue, five-stage MIPS like pipeline, with the following latencies.
Instruction Execution Cycles L.D, S.D 1 ADD.D, SUB.D 5
Consider the following MIPS code:
Loop: L.D F0, 0(R1) ADD.D F4, F0, F SUB.D F8, F6, F S.D F8, 0(R1) DADDUI R1, R1, # BNE R1, R2, Loop
a) [10 pts] Show the steady state software-pipelined loop code. You can assume the loop will iterate many times. You don’t need to show startup and cleanup code. Ignore the branch delay slot.
Solution:
Loop: S.D F8, -24(R1) SUB.D F8, F6, F ADD.D F4, F0, F L.D F0, 0(R1) DADDUI R1, R1, # BNE R1, R2, Loop
b) [10 pts] Show the startup and cleanup code for the software pipelined loop. Do not worry about scheduling the code optimally.
Solution:
Startup: L.D F0, 0(R1) ADD.D F4, F0, F SUB.D F8, F6, F L.D F0, 8(R1) ADD.D F4, F0, F L.D F0, 16(R1) DADDUI R1, R1, #
Cleanup: S.D F8, -24(R1) SUB.D F8, F6, F S.D F8, -16(R1) ADD.D F4, F0, F SUB.D F8, F6, F S.D F8, -8(R1)
c) [5 pts] Name two advantages of software pipelining over loop unrolling.
Solution: A software pipelined loop takes up less static code space than an unrolled loop. Software pipelining does not increase the register pressure of the code as much as loop unrolling. Software pipelining allows the loop to run with perfect scheduling for longer periods of time than loop unrolling.