Midterm Exam - Solved Problems - Computer System Organization | CS 433, Exams of Computer Architecture and Organization

Material Type: Exam; Class: Computer System Organization; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Spring 2008;

Typology: Exams

Pre 2010

Uploaded on 03/16/2009

koofers-user-ieu
koofers-user-ieu 🇺🇸

10 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 433 Midterm Exam – March 4, 2008
Professor Sarita Adve
Time: 2 Hours
Please clearly print your name and NetID and circle the appropriate category in the space
provided below.
Name Solutions
NetID
Category 3 Credit Hours 4 Credit Hours
Instructions
1. You may only use class handouts from this semester’s offering, the course text ( Computer
Architecture: A Quantitative Approach 4th Edition – by Hennessy and Patterson), your own
homework submissions for this course, slides presented in class for mini-projects, papers
indicated as reference material in class, and notes written or typed by yourself. You may also use
homework solutions and sample midterms provided on the course website. No other materials
are allowed, including other books, notes prepared by others, or materials from previous offerings
of this course (except as noted here) or from other universities.
2. Calculators are allowed. You may not use any other electronic devices for any purpose during
the exam.
3. Please do not turn in loose scrap paper. Limit your answers to the space provided if possible. If
this is not possible, please write on the back of the same sheet. You may use the back of each
sheet for scratch work.
4. In all cases, show your work. No credit will be given for numeric answers if there is no
indication of how the answer was derived. Partial credit will be given even if your final solution
is incorrect, provided you show the intermediate steps in reaching the final solution.
5. If you believe a problem is incorrectly or incompletely specified, make a reasonable assumption
and solve the problem. The assumption should not result in a trivial solution. In all cases, clearly
state any assumptions that you make in your answers.
6. This exam has 5 problems and 8 pages (including this one). Part C of problem 5 is only for
graduate students. All other problems are required for all students. Please budget your time
appropriately. Good luck!
Problem Maximum Points Received Points
1 5
2 20
3 15
4 6
5 4 for undergrads, 10 for
grads
Total 50 for undergrads, 56
for grads
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Midterm Exam - Solved Problems - Computer System Organization | CS 433 and more Exams Computer Architecture and Organization in PDF only on Docsity!

CS 433 Midterm Exam – March 4, 2008

Professor Sarita Adve

Time: 2 Hours

Please clearly print your name and NetID and circle the appropriate category in the space

provided below.

Name Solutions NetID Category 3 Credit Hours 4 Credit Hours Instructions

  1. You may only use class handouts from this semester’s offering, the course text ( Computer Architecture: A Quantitative Approach – 4th^ Edition – by Hennessy and Patterson), your own homework submissions for this course, slides presented in class for mini-projects, papers indicated as reference material in class, and notes written or typed by yourself. You may also use homework solutions and sample midterms provided on the course website. No other materials are allowed, including other books, notes prepared by others, or materials from previous offerings of this course (except as noted here) or from other universities.
  2. Calculators are allowed. You may not use any other electronic devices for any purpose during the exam.
  3. Please do not turn in loose scrap paper. Limit your answers to the space provided if possible. If this is not possible, please write on the back of the same sheet. You may use the back of each sheet for scratch work. 4. In all cases, show your work. No credit will be given for numeric answers if there is no indication of how the answer was derived. Partial credit will be given even if your final solution is incorrect, provided you show the intermediate steps in reaching the final solution.
  4. If you believe a problem is incorrectly or incompletely specified, make a reasonable assumption and solve the problem. The assumption should not result in a trivial solution. In all cases, clearly state any assumptions that you make in your answers.
  5. This exam has 5 problems and 8 pages (including this one). Part C of problem 5 is only for graduate students. All other problems are required for all students. Please budget your time appropriately. Good luck! Problem Maximum Points Received Points 1 5 2 20 3 15 4 6 5 4 for undergrads, 10 for grads Total 50 for undergrads, 56 for grads

Problem 1 [ 5 points]: Suppose we apply an enhancement, E1, that speeds up 20% of a program by a factor of 4. And we apply another enhancement, E2, that speeds up another 10% of the program by a factor of 10. Assume the two enhancements are independent and affect different parts of the program (there is no overlap between the 20% and 10% of program). What is the overall speed up for the entire program using both E1 and E simultaneously? Solution: This problem tests the understanding of Amdahl’s law. According to Amdahl’s law, Speedup = (old latency) / (new latency) , new latency = (1 – f1 – f2) * old latency + f1/S1*old_latency + f2/S2 * old_latency. f1 = 20%, S1 = 4; f2 = 10%, S2 = 10; new latency = old latency *. Final speedup = 1. Grading: 2 point for listing Amdahl’s law and the standard equation; 1 points for the equation of this problem; 2 points for substituting the correct values in the equation. Problem 2 [20 Points] This problem concerns Tomasulo’s algorithm (with reservation stations) with the reorder buffer scheme discussed in detail in the lecture notes. We have the following changes/additions/clarifications relative to the discussion in class.  Assume the following information about functional units. Functional Unit Type Cycles in EX Integer Mul 2 Integer Div 10 Integer Add 1

2 DIV^ R1, R1, R2^2 R1 RF R2 CDB 5 15 16

3 ADD^ R5, R1, R3^3 R1 CDB R3 RF 16 17 18

4 ADDI R7, R5, 4^4 R5 CDB 18 19 20

5 ADD^ R5, R6, R8^5 R6 RF R8 RF 6 7 21

6 ADDI R8, R8, 2^6 R8 RF 7 8 22

7 ADD^ R9, R6, R9^7 R6 RF R9 RF 8 9 23

8 ADD R5, R5, R10^8 R5 ROB R10 RF 9 10 24

9 ADD^ R6, R8, R5^9 R8 ROB R5 CDB 11 12 25

Grading: 0.5 point per entries 1 point if the majority is correct Do not cascade errors. Do not take off additional points if an earlier error causes later inaccuracies.

Problem 3 [15 points] Consider the following code fragment: Loop: LD.D F2, 0(R1) MUL.D F4, F6, F ADD.D F4, F4, F SD.D F4, 0(R1) DADDUI R1, R1, # BNE R1, R3, Loop Consider a pipeline with the following latencies: 1 cycle between a load and a dependent ALU instruction; 2 cycles between two dependent FP ALU instructions; 3 cycles between an FP ALU and a dependent store instruction; and 0 cycles between all other pairs. That is, there would need to be one stall cycle between the load and multiply above for correct operation. Unroll the above loop 4 times and write the resulting code on the left of the table below. You have access to temporary registers T0…T63. Assume the total number of iterations for the original loop is a multiple of 4. Then schedule the unrolled loop for a VLIW machine where each VLIW instruction can contain one memory reference, one FP operation, and one integer operation. Write the scheduled instructions in the table below to minimize the number of stalls. You may use L for LD.D, M for MUL.D, etc. Mem FP ALU Int ALU

Solution: Loop: LD.D F2, 0(R1) MUL.D F4, F6, F ADD.D F4, F4, F SD.D F4, 0(R1) LD.D T0, 8(R1) MUL.D T2, F6, T ADD.D T4, T2, F SD.D T4, 8(R1) LD.D T6, 16(R1) MUL.D T8, F6, T ADD.D T10, T8, F SD.D T10, 16(R1) LD.D T12, 24(R1) MUL.D T14, F6, T ADD.D T16, T14, F SD.D T16, 24(R1) DADDUI R1, R1, # BNE R1, R3, Loop E: For loop unrolling, not as long as the unrolled loop works the same, should give full credits. No need to schedule at this point. Mem FP ALU Int ALU LD.D F2, 0(R1) LD.D T0, 8(R1) LD.D T6, 16(R1) MUL.D F4, F6, F LD.D T12, 24(R1) MUL.D T2, F6, T MUL.D T8, F6, T6 DADDUI R1, R1, # MUL.D T14, F6, T ADD.D F4, F4, F ADD.D T4, T2, F ADD.D T10, T8, F ADD.D T16, T14, F SD.D F4, -32(R1) SD.D T4, -24(R1) SD.D T10, -16(R1) SD.D T16, -8(R1) BNE R1, R3, L

Grading: 6 points for loop unrolling, 1/3 pt each instruction, round up. 9 pts for filling up the table, .5 pt for each instruction element. Round up. Problem 4 [6 points] Consider a loop that is entered several times in a program. Each time it is entered, the loop performs 10 iterations. Each iteration executes four branches with the following outcomes (branch 1 occurs before branch 2 which occurs before branch 3 which occurs before branch 4 in each iteration): Iteration 1 2 3 4 5 6 7 8 9 10 Branch 1 N N N N N N N N N T Branch 2 T T T T T T T T T Branch 3 T T N T T T N T N Branch 4 N N T N N N T N T When Branch 1 is taken at iteration 10, the program counter leaves the loop, and branches 2, 3, and 4 are not reached. Of all the dynamic branch predictors studied in class, state the cheapest predictor that will give the best misprediction rate for each of the following branches. Explain why. (A) Branch 1: Solution:

Branch 1 is almost always not taken. We need to use a 2-bit predictor. It will have one

misprediction each time the branch leaves the loop. In contrast, a 1 bit predictor will have two

mispredictions – in the first and last iterations of each loop invocation.

(B) Branch 2: Solution: Branch 2 is always taken. It will have the same result on all predictors. Therefore, use a 1-bit predictor. (C) Branch 4: Solution: Branch 4 is always the opposite of the most recent Branch 3. Therefore, use a (1,1) correlating predictor. Grading: 2 points each: 1 per predictor, 1 per reason.

Problem 5 [4 points for undergraduates, 10 points for graduates. ]

entire block has been evicted to make room for the next block. Adding the L2 cache removes all capacity misses, and moves the miss ratio down to 50% ii) Each cache line (for both L1 and L2 chaches) is 1KB, and each element is 64 bytes. One line = 16 elements, and the array has 640Kentries. The elements are accessed in this order: 0, 64, 128, … 640K, 1, 65, … After the entire array is accessed, it is accessed again in the same order. The first 1024 entries are compulsory misses. The rest are capacity misses. By the time the process accesses entry 1, the cache line with entries 0-15 will already have been evicted from the L 1 cache. But it will still be in the L2 cache. When the process gets to entry 16, none of the desired lines will be in either cache; ultimately every line in L2 will be replaced. The same will happen starting at 32 and 48. Therefore, only multiples of 16 will be misses (compulsory ones), so the miss ratio drops to 1/16=6.25%