Dynamic Branch Predictions and Multiple Issues Processors | CS 433, Exams of Computer Architecture and Organization

Material Type: Exam; Professor: Torrellas; Class: Computer System Organization; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2004;

Typology: Exams

Pre 2010

Uploaded on 03/11/2009

koofers-user-j5r
koofers-user-j5r 🇺🇸

10 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS433 Midterm
Prof Josep Torrellas
October 19, 2004
Time: 1 hour + 15 minutes
Name:
Instructions:
1. This is a closed-book, closed-notes examination.
2. The Exam has 4Questions. Please budget your time.
3. Calculators are allowed.
4. Please write your answers neatly. Good luck!
Problem No. Maxm Points Points Scored
1 25
2 25
3 25
4 25
Total 100
1
pf3
pf4
pf5
pf8

Partial preview of the text

Download Dynamic Branch Predictions and Multiple Issues Processors | CS 433 and more Exams Computer Architecture and Organization in PDF only on Docsity!

CS433 Midterm

Prof Josep Torrellas

October 19, 2004

Time: 1 hour + 15 minutes

Name:

Instructions:

  1. This is a closed-book, closed-notes examination.
  2. The Exam has 4 Questions. Please budget your time.
  3. Calculators are allowed.
  4. Please write your answers neatly. Good luck!

Problem No. Maxm Points Points Scored

1 25

2 25

3 25

Total 100

1 Dynamic Branch Prediction [25 Points]

Consider code that executes branches Br1 and Br2 with the following order and outcomes:

Branch Br1 Br1 Br1 Br1 Br2 Br1 Br1 Br1 Br1 Br

Outcome T T T NT T T T T NT NT

a) [7 points] Assume that 1 bit branch predictors are used. When the processor starts to execute, both predictors contain value NT (Not Taken). What is the number of correct predictions? Use the following table to record the prediction of each branch.

Step Br1 Predict. Br1 Action Br2 Predict. Br2 Action


1 NT T NT T

2 T T T NT

3 T T

4 T NT

5 NT T

6 T T

7 T T

8 T NT

Number of correct predictions: 4 Number of correct predictions: 0

b) [7 Points] Now assume that 4 bit saturation counters are used. When the processor starts to execute, both counters contain value 7. What is the number of correct predictions? Use the following table to record the prediction of each branch.

Step Br1 Predict. Br1 Action Br2 Predict. Br2 Action


1 NT T NT T

2 T T T NT

3 T T

4 T NT

5 T T

6 T T

7 T T

8 T NT

Number of correct predictions: 5 Number of correct predictions: 0

2 Tomasulo’s Algorithm with Speculative Execution [25 points]

Suppose we have a single issue processor using Tomasulo’s algorithm with hardware speculative execution.

FU type Time in EX Num of FUs Num of Resv. Stations FP Add/Sub 4 1 2 FP Mul 7 2 3 Load/Store 1 1 2 Integer/Branch ops 1 1 2

Make the following assumptions:

  • Single issue, single writeback, single commit.
  • Issue and writeback stages each take one cycle.
  • Functional units are not pipelined.
  • Results are communicated via the CDB.
  • Branches/Stores don’t write back.
  • The Load/Store functional unit takes care of both address calculation and memory access. (one cycle to complete the combination of address calculation and memory address)
  • Assume infinite instruction queue and ROB.
  • An instruction releases its functional unit at the end of the execution stage.
  • An instruction releases its reservation station at the end of the writeback stage.
  • Whenever there is a conflict for a functional unit, assume that the first (in program order) of the conflicting instructions gets access, while the others are stalled. This includes possible writeback conflicts.

Fill in the following table. Enter in each of IS, EX, WR, and Commit the cycles in which the instruc- tion occupies each stage. Leave stall cycles out from the table. The first line is filled in for you.

Instruction Issue Execute Writeback Commit Reason for stalling

LD F0, 0(R1) 1 2 3 4

LD F2, 8(R1) 2 3 4 5

LD F4, 16(R1) 4 5 6 7 struc hazard for load RS

MUL.D F6, F0, F4 5 7-13 14 15 RAW on F

ADD.D F8, F0, F2 6 7-10 11 16

ADD.D F10, F6, F8 7 15-18 19 20 RAW on F

ADD.D F12, F10, F4 12 20-23 24 25 struc hazard for ADD RS, RAW for F

SD F12, 0(R1) 13 25 26 RAW on F

DADDUI R1, R1, #24 14 15 16 27

DSUBUI R2, R2, #1 15 16 17 28

BNEZ R2, loop 17 18 29 struc hazard for Int RS

3 Multiple Issue Processors [25 points]

a) [7 points] What is a multiple issue processor? Define the two types of multiple issue processors, and give two advantages of each.

Solution: A multiple issue processor can issue and commit more than one instruction per cycle, potentially obtaining CPI’s less than 1. The two types are VLIW and superscalar. VLIW processors don’t require the complicated hazard checking hardware that superscalars do. Po- tentially shorter cycle times/instruction decode stages can result from not checking for dependencies Superscalar processors don’t require code to be recompiled for each different version of the hardware, and can better handle dependencies that can not be determined at compile time.

b) [6 points] What is dynamic and static scheduling? Give an example that shows when dynamic scheduling is useful. For the machines listed above, are they dynamic and/or static?

Solution: Dynamically scheduling is when the code is scheduled at run time by the processor, stalling as neces- sary to resolve hazards. Statically scheduled code is scheduled by the compiler and executed in order. VLIW: static Superscalar: dynamic or static.

4 Software Pipelining [25 points]

Assume a single-issue, five-stage MIPS like pipeline, with the following latencies.

Instruction Execution Cycles L.D, S.D 1 ADD.D, SUB.D 5

Consider the following MIPS code:

Loop: L.D F0, 0(R1) ADD.D F4, F0, F SUB.D F8, F6, F S.D F8, 0(R1) DADDUI R1, R1, # BNE R1, R2, Loop

a) [10 pts] Show the steady state software-pipelined loop code. You can assume the loop will iterate many times. You don’t need to show startup and cleanup code. Ignore the branch delay slot.

Solution:

Loop: S.D F8, -24(R1) SUB.D F8, F6, F ADD.D F4, F0, F L.D F0, 0(R1) DADDUI R1, R1, # BNE R1, R2, Loop

b) [10 pts] Show the startup and cleanup code for the software pipelined loop. Do not worry about scheduling the code optimally.

Solution:

Startup: L.D F0, 0(R1) ADD.D F4, F0, F SUB.D F8, F6, F L.D F0, 8(R1) ADD.D F4, F0, F L.D F0, 16(R1) DADDUI R1, R1, #

Cleanup: S.D F8, -24(R1) SUB.D F8, F6, F S.D F8, -16(R1) ADD.D F4, F0, F SUB.D F8, F6, F S.D F8, -8(R1)

c) [5 pts] Name two advantages of software pipelining over loop unrolling.

Solution: A software pipelined loop takes up less static code space than an unrolled loop. Software pipelining does not increase the register pressure of the code as much as loop unrolling. Software pipelining allows the loop to run with perfect scheduling for longer periods of time than loop unrolling.