CS/ECE 752 Spring 2014 Exam 1: Computer Architecture and Organization, Study notes of Computer Architecture and Organization

University of Wisconsin - Madison. CS/ECE 752 Advanced Computer Architecture I. Midterm Exam 1. Monday, February 17, 2014. Instructions:.

Typology: Study notes

2022/2023

Uploaded on 05/11/2023

leonpan
leonpan 🇺🇸

4

(12)

286 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS/ECE 752 Spring 2014 Exam 1 -- Page 1
Last (family) name: _________________________
First (given) name: _________________________
Student I.D. #: _____________________________
Department of Computer Sciences
University of Wisconsin - Madison
CS/ECE 752 Advanced Computer Architecture I
Midterm Exam 1
Monday, February 17, 2014
Instructions:
1. Open book/open notes.
2. The exam is multiple choice and will be graded using a separate automatically read grading
sheet. Please write your name, ID number, and answers on the grading sheet. Be sure to fill
in the bubbles fully for each question.
3. Upon announcement of the end of the exam, stop writing on the exam paper immediately.
Pass the exam to aisles to be picked up by the proctors. The instructor will announce when
to leave the room.
4. Failure to follow instructions may result in forfeiture of your exam and will be handled
according to UWS 14 Academic misconduct procedures.
Problem
Type
Points
Score
1-15
Multiple Choice
30
16-18
Hierarchical Branch Predictor Performance
15
19-23
Program Data Dependence Analysis
15
24-27
Instruction Scheduling
20
28-37
From the Readings
20
Total
100
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download CS/ECE 752 Spring 2014 Exam 1: Computer Architecture and Organization and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Last (family) name: _________________________ First (given) name: _________________________ Student I.D. #: _____________________________ Department of Computer Sciences University of Wisconsin - Madison CS/ECE 752 Advanced Computer Architecture I

Midterm Exam 1

Monday, February 17, 2014

Instructions:

  1. Open book/open notes.
  2. The exam is multiple choice and will be graded using a separate automatically read grading sheet. Please write your name, ID number, and answers on the grading sheet. Be sure to fill in the bubbles fully for each question.
  3. Upon announcement of the end of the exam, stop writing on the exam paper immediately. Pass the exam to aisles to be picked up by the proctors. The instructor will announce when to leave the room.
  4. Failure to follow instructions may result in forfeiture of your exam and will be handled according to UWS 14 Academic misconduct procedures. Problem Type Points Score 1 - 15 Multiple Choice 30 16 - 18 Hierarchical Branch Predictor Performance 15 19 - 23 Program Data Dependence Analysis 15 24 - 27 Instruction Scheduling 20 28 - 37 From the Readings 20 Total 100

Problems 1-20: (40 pts): Multiple choice; select the best answer for each question Note: Some of the answers below are “list” answers, such as “All of the above” and “Both (a) and (b).” In these answers, the word “above” refers only to “real” answers with specific content, not other list answers.

  1. A pipelined processor that does not have any WAR register hazards
    1. Must have an earlier register read stage and a later register write stage
    2. Must have an earlier register write stage and a later register read stage
    3. Must have two stages that write registers, one later than the other
    4. None of the above
  2. A VLIW instruction set processor
    1. Packs multiple operations into a single instruction
    2. Usually relies on software to resolve pipeline hazards
    3. Can operate at much higher frequency than other approaches
    4. Exposes a lot more instruction-level parallelism than other approaches
    5. Both (a) and (b)
    6. None of the above
  3. Local branch history in a dynamic branch predictor is used to:
    1. Predict a branch based on how neighboring branches were resolved recently
    2. Predict a branch based on how that same branch resolved recently
    3. Predict a branch based on the sign bit of its offset field
    4. Predict a branch based on a profiling run collected with a representative input set
  4. A branch that is mispredicted as not-taken requires the processor control logic to:
    1. Restart fetching instructions from the not-taken path
    2. Clear out all instructions that were not tagged with the mispredicted branch’s tag
    3. Fix up the rename table to correspond to the state following the branch
    4. None of the above
    5. Both (b) and (c)
  5. A two-level dynamic branch prediction:
    1. Uses a second-level branch history table to capture the branch working sets of very large programs that contain thousands of branches
    2. Uses two levels of branch confidence to identify easy-to-predict and hard-to- predict branches
    3. Learns more than one possible prediction for a static branch by using branch outcome history as part of the lookup index into the pattern history table
    4. Accurately predicts exits from loops that have iteration counts in the hundreds
    5. None of the above
  6. The return address stack (RAS):
    1. Pushes the return address whenever a call instruction is encountered
    2. Pops a return address for every return instruction
    3. Raises an exception when the stack overflows or underflows
    4. None of the above
    5. Both (a) and (b)
    6. All of the above
  1. Clustering an N-wide super scalar typically:
    1. Increases CPI and increases cycle time
    2. Increases CPI and decreases cycle time
    3. Decreases CPI and increases cycle time
    4. Decreases CPI and decreases cycle time
  2. Dynamic power has a
    1. Linear relationship with voltage
    2. Quadratic relationship with voltage
    3. Cubic relationship with voltage
    4. None of the above
  3. Reducing clock frequency
    1. Reduces dynamic power
    2. Reduces static power
    3. Reduces performance
    4. All of the above
    5. Both (a) and (b)
    6. Both (a) and (c)
    7. None of the above

Hierarchical Branch Predictors (15 points) Some highly-pipelined processors use a hierarchical branch prediction scheme, similar to how most modern processors now use hierarchical caches. These systems typically have a small, simple level-1 (L1) predictor (e.g., a branch target buffer) that can return a prediction within a single cycle. The second level-2 (L2) predictor is typically a much larger multilevel predictor (such as a the Alpha EV-8 direction predictor) that makes a much more accurate prediction, but requires two or more cycles to make the prediction. Both predictors are accessed for each branch. The processor uses the L1 predictor to begin speculative execution of the branch, but checks this prediction using the L2 predictor. If the L2 predictor disagrees with the L1, the branch is aborted and restarted using the L2 prediction. Assuming the L2 prediction is correct, the L1 misprediction penalty is much smaller than a full misprediction penalty. These hierarchical predictors can be better than a large single-level predictor because the latter may have a larger penalty on a correctly predicted branch. Consider a hierarchical predictor with the following performance: L1 Prediction L2 Prediction Stall Cycles Correct Correct 0 Correct Incorrect 11 Incorrect Correct 3 Incorrect Incorrect 8 Assume that one in 6 instructions are branch instructions and that the L1 and L2 predictions are independent. Assume that all branch prediction stalls directly impact performance (as they might in the MIPS 5 - stage pipeline). Thus we want to know the contribution to the CPI caused by branch mispredictions (i.e., the stall cycles per instruction). Assume the L1 predictor is right 80% of the time and the L2 predictor is right 95% of the time.

  1. How many stall cycles per instruction are due to either L1 or L2 mispredictions?
      1. 65
    1. None of the above
  2. Which case contributes the most to stall cycles per instruction?
    1. L1 Correct, L2 Correct
    2. L1 Correct, L2 Incorrect
    3. L1 Incorrect, L2 Correct
    4. L1 Incorrect, L2 Correct
    5. None of the above
  3. Which is more important?
    1. Improving the L1 predictor from 80% to 90% accurate?
    2. Improving the L2 predictor from 95% to 97% accurate?
    3. Reducing the penalty on L1 mispredict, L2 correct predict from 3 to 2 cycles?
    4. Reducing the penalty on L1 correct predict, L2 mispredict from 11 to 8 cycles?

Instruction Scheduling (20 points) Using the same pseudo-assembly language, but adding floating point (i.e., the “.d” suffix means double precision), the questions below involve instruction schedules. 1 lw.d F0 = mem[R2 + 0] 2 mult.d F3 = F0 * F 3 sw.d F3  mem[R2 + 0] 4 lw.d F4 = mem[R3 + 0] 5 mult.d F5 = F4 * F 6 sw.d F5  mem[R3 + 0] 7 add.d F6 = F3 + F 8 sw.d F6  mem[R4 + 0] Assume the standard MIPS 5 - stage (single-issue) pipeline plus separate pipes for floating point multiply and floating point add. All loads and stores use the integer pipeline and the floating- point pipeline does not have an “M” stage. Load-use delay is one cycle, regardless of which pipeline uses the result. Floating point multiply has 3 execute cycles and floating point add takes two execute cycles, and both are fully pipelined.

  1. On what cycle does instruction 8 store its value to memory?
    1. Cycle 13
    2. Cycle 14
    3. Cycle 15
    4. Cycle 16
    5. Cycle 17
    6. None of the above
  2. A compiler scheduler that is trying to reduce stalls might generate the following instruction schedule (using the instruction numbers):
    1. 1, 2, 3, 4, 5, 6, 7, 8
    2. 1, 2, 3, 4, 5, 7, 6, 8
    3. 1, 2, 3, 4, 7, 5, 6, 8
    4. 1, 4, 2, 5, 3, 6, 7, 8
    5. 1, 4, 7, 8, 2, 5, 3, 6
    6. None of the above
  3. The key source of stalls in this code sequence is:
    1. The depth of the floating point multiply pipeline
    2. The depth of the floating point add pipeline
    3. The load-use delay
    4. A maybe dependence
    5. None of the above
  4. If the instruction set were IA-64 instead of MIPS, we could reduce stalls using:
    1. An advanced load for instruction 1
    2. An advanced load for instruction 2
    3. A speculative load for instruction 1
    4. A speculative load for instruction 2
    5. None of the above

From the readings (20 points)

  1. In Moore’s classic paper semiconductor scaling, he predicted which of the following technologies might be made possible:
    1. Personal computers
    2. Cell phones
    3. Self-driving cars
    4. None of the above
    5. All of the above
  2. Moore also predicted that the maximum number of transistors per chip would double:
    1. Every 12 months
    2. Every 18 months
    3. Every 24 months
    4. None of the above
  3. In Wulf’s paper on Compilers and Com[uter Architecture, he argues that “the failure general- register machines to treat all their registers alike” violates the following property:
    1. Regularity
    2. Orthogonality
    3. Composability
    4. None of the above
    5. All of the above
  4. In Srinivasan, et al.’s paper on optimal pipeline depth, they argue that as the pipeline depth increases, the following changes also occur:
    1. FO4 per stage increases, average glitch factor increases
    2. FO4 per stage increases, average glitch factor decreases
    3. FO4 per stage decreases, average glitch factor increases
    4. FO4 per stage in decreases, average glitch factor decreases
    5. None of the above
  5. In Seznec, et al.’s paper on the Alpha EV-8 branch predictor, the authors argue that partial update is superior to full update because:
    1. It limits the number of strengthened counters on a correct prediction.
    2. It doesn’t steal a table entry if it can be avoided
    3. It utilizes space better.
    4. None of the above
    5. All of the above
  6. The Intel IA-64 instruction set includes the following features:
    1. A register stack to pass arguments to and return values from subroutines
    2. Support for full predication
    3. Register renaming to support software pipelined loops
    4. None of the above
    5. All of the above

(blank page for additional work)