CS433: Computer Systems Organization Spring 2007 Homework 2 - Prof. Josep Torrellas, Assignments of Computer Architecture and Organization

Instructions and exercises for homework 2 of the cs433: computer systems organization course offered in spring 2007. The homework covers topics such as tomasulo’s algorithm, dynamic branch prediction, and speculative execution. Students are required to fill in execution profiles, record branch predictions, and answer short questions related to the given code fragments.

Typology: Assignments

Pre 2010

Uploaded on 03/10/2009

koofers-user-pyi
koofers-user-pyi 🇺🇸

10 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS433: Computer Systems Organization Spring 2007
Homework 2
Assigned: 2/6
Due in class 2/27
Instructions:
Please write your name, NetID and an alias on your homework submissions for posting grades (If you don’t
want your grades posted, then don’t write an alias). We will use this alias throughout the semester.
Homeworks are due in class on the date posted.
1. Tomasulo’s Algorithm [10 points]
This exercise examines Tomasulo’s algorithm on a simple loop operation. Consider the following code
fragment
LOOP: L.D F2, 0(R1)
L.D F4, 8(R1)
DIV.D F6, F2, F4
MUL.D F8, F6, F6
ADD.D F6, F2, F4
MUL.D F10, F6, F6
S.D F8, 0(R1)
S.D F10, 8(R1)
DADDI R1, R1, #16
BNEZ R1, LOOP
running on a system with the following specifications:
The pipeline functional units are described by Table 1.
Functional Unit Cycles in EX # Functional Units # Reservation Stations
Integer 1 1 5
FP add/subtract 4 1 4
FP multiply/divide 15 2 4
Table 1: Functional Unit Specification
Functional units are not pipelined.
All stages except EX take one cycle to complete.
There is no forwarding between functional units. Both integer and floating point results are communicated through
the CDB.
Memory accesses use the integer functional unit to perform effective address calculation. All loads and stores access
memory during the EX stage.
There are unlimited load/store buffers and an infinite instruction queue.
Loads and stores take one cycle to execute.
If an instruction is in the WR stage in cycle x, then an instruction that is waiting on the same functional unit (due
to a structural hazard) can begin execution in cycle x+ 1.
Only one instruction can write to the CDB in a clock cycle.
Branches and stores do not need the CDB.
Whenever there is a conflict for a functional unit or the CDB, assume program order.
When an instruction is done executing in its functional unit and is waiting for the CDB, it is still occupying the
functional unit and its reservation station (meaning no other instruction may enter).
Treat the BNEZ instruction as an Integer instruction. Assume L.D instruction after the BNEZ can be issued the
cycle after the BNEZ instruction is issued due to branch prediction.
Initially, R1 <-16.
1
pf3
pf4
pf5

Partial preview of the text

Download CS433: Computer Systems Organization Spring 2007 Homework 2 - Prof. Josep Torrellas and more Assignments Computer Architecture and Organization in PDF only on Docsity!

CS433: Computer Systems Organization Spring 2007 Homework 2 Assigned: 2/ Due in class 2/

Instructions: Please write your name, NetID and an alias on your homework submissions for posting grades (If you don’t want your grades posted, then don’t write an alias). We will use this alias throughout the semester. Homeworks are due in class on the date posted.

  1. Tomasulo’s Algorithm [10 points] This exercise examines Tomasulo’s algorithm on a simple loop operation. Consider the following code fragment LOOP: L.D F2, 0(R1) L.D F4, 8(R1) DIV.D F6, F2, F MUL.D F8, F6, F ADD.D F6, F2, F MUL.D F10, F6, F S.D F8, 0(R1) S.D F10, 8(R1) DADDI R1, R1, # BNEZ R1, LOOP running on a system with the following specifications: - The pipeline functional units are described by Table 1.

Functional Unit Cycles in EX # Functional Units # Reservation Stations Integer 1 1 5 FP add/subtract 4 1 4 FP multiply/divide 15 2 4

Table 1: Functional Unit Specification

  • Functional units are not pipelined.
  • All stages except EX take one cycle to complete.
  • There is no forwarding between functional units. Both integer and floating point results are communicated through the CDB.
  • Memory accesses use the integer functional unit to perform effective address calculation. All loads and stores access memory during the EX stage.
  • There are unlimited load/store buffers and an infinite instruction queue.
  • Loads and stores take one cycle to execute.
  • If an instruction is in the WR stage in cycle x, then an instruction that is waiting on the same functional unit (due to a structural hazard) can begin execution in cycle x + 1.
  • Only one instruction can write to the CDB in a clock cycle.
  • Branches and stores do not need the CDB.
  • Whenever there is a conflict for a functional unit or the CDB, assume program order.
  • When an instruction is done executing in its functional unit and is waiting for the CDB, it is still occupying the functional unit and its reservation station (meaning no other instruction may enter).
  • Treat the BNEZ instruction as an Integer instruction. Assume L.D instruction after the BNEZ can be issued the cycle after the BNEZ instruction is issued due to branch prediction.
  • Initially, R1 < -16.

Fill in the execution profile for the first two iterations of the above code fragment in Table 2, including

  • The reservation station used by each instruction. This should include both the functional unit type and the number of the reservation station. If multiple reservation stations of a particular type are available, associate early program order with lower cardinality.
  • The cycles that each instruction occupies in the IS, EX, and WR stages.
  • Comments to justify your answer such as type of hazards and the registers involved.

The first cycle is filled in for you.

Instruction Reservation Station IS EX WR Comments (if appropriate) L.D F2, 0(R1) Integer 1 1 L.D F4, 8(R1) DIV.D F6, F2, F MUL.D F8, F6, F ADD.D F6, F2, F MUL.D F10, F6, F S.D F8, 0(R1) S.D F10, 8(R1) DADDI R1, R1, # BNEZ R1, LOOP L.D F2, 0(R1) L.D F4, 8(R1) DIV.D F6, F2, F MUL.D F8, F6, F ADD.D F6, F2, F MUL.D F10, F6, F S.D F8, 0(R1) S.D F10, 8(R1) DADDI R1, R1, # BNEZ R1, LOOP

Table 2: Execution profile using Tomasulo’s Algorithm

  1. Dynamic Branch Prediction [10 points]

Consider the following MIPS code fragment. DADDI R1, R0, # LOOP1: DADDI R2, R1, # LOOP2: DSUBI R2, R2, # BNEZ R3, LOOP2 ; Branch 1 DSUBI R1, R1, # BNEZ R1, LOOP1 ; Branch 2

(a) Assume that 1 bit branch predictors are used. When the processor starts to execute the afore- mentioned code, both predictors contain value N (not taken). What is the number of correct predictions? Use the following tables to record the prediction and action of each branch. [ points]

  1. Short Answer for branches [10 points] Consider the following code fragment:

ADDI R1, R0, # LOOP: BNEZ R1, END ; Branch 1 ANDI R2, R1, # BNEZ R2, ODD ; Branch 2 EVEN: ... J DECR ODD: ... DECR: DSUBI R1, R1 ,# J LOOP END: ... (This is roughly equivalent to)

for (i = 100; i > 0; i--) { // Branch 1 if (i mod 2 == 0) { // Branch 2 ... } else { ... } } ...

(a) For branch 1, would a (2,1) correlating predictor or a 2-bit saturation counter be more appropriate. Why? [3 points] (b) For branch 2, would a (2,1) correlating predictor or a 2-bit saturation counter be more appropriate. Why? [3 points] (c) What would type of branch predictor would be able to handle both branches well? [2 points] (d) What is the branch target buffer and why does it improve performance? [2 points]

  1. Speculative Execution [10 points]

Consider the speculative Tomasulo processor shown in Figure 2.9 of the text book. Assume the fol- lowing:

  • The ROB has four buffer entries, named 0, 1, 2, and 3.
  • Integer operations including BNEZ require 1 cycle to execute, FP ADD.D requires 4 cycles and FP MUL.D requires 15 cycles.
  • Assume there are enough reservation stations, and functional units to accomodate all instructions being issued (there might not be enough ROB entries however).
  • Assume there is full forwarding between all functional units (i.e. when an instruction is waiting for the result from another instruction, it can begin execution after the dependant instruction completes its EX stage).
  • The processor is single issue, single commit.
  • The branch is not taken and assume that it is correctly predicted.

For the following piece of code, completely fill in the execution table but show the ROB contents and history as it would be at the end of the cycle where DSUBI commits. Assume that F8, R1, and R are initialized and that the ROB is initially empty. Because a ROB is implemented as a circular queue, the entry number labels repeat modulo 4 reading down the table. When ROB entries are reallocated during the simulated execution time, write the details of the new allocation in the next available correspondingly numbered table row. In the Value column, write the computed result in algebraic form (ex. F5 - F3). If the result has not been computed yet, write a dash. In the Ready Column, write Yes if the ready bit for this entry was set to Yes, otherwise write No. Use the Commit column to indicate the cycle number when the instruction committed. If the instruction has not committed yet, simply write a dash. The first Cycle is filled in for you.

LOOP: MUL.D F0, F8, F

ADD.D F2, F8, F

ADD.D F4, F0, F

DSUBI R1, R1,

BNEZ R1, LOOP

ADD.D F6, F6, F

Execution Table ROB entry Instruction IS EX WR CMT Stall Reason 0 MUL.D F0, F8, F8 1 ADD.D F2, F8, F ADD.D F4, F0, F DSUBI R1, R1, # BNEZ R1, LOOP ADD.D F6, F6, F

Reorder Buffer Entry Instruction Destination Value Ready Commit MUL.D F0 No ADD.D ADD.D DSUBI BNEZ ADD.D

  1. Short Answer [10 points]

(a) What technique does Tomasulo’s employ to eliminate WAR and WAW hazards? Why does it work? Why doesn’t it also eliminate RAW? [4 points] (b) What is the difference between reservation stations and reorder buffers? [3 points] (c) Reservation stations and reorder buffers both have value fields to store the result of an instructions. Why do we still need this value field in the reservation station if it is available in the reorder buffer? [3 points]