Assignment 1 for Computer System Organization | CS 433, Assignments of Computer Architecture and Organization

Material Type: Assignment; Class: Computer System Organization; Subject: Computer Science; University: University of Illinois - Urbana-Champaign; Term: Fall 2006;

Typology: Assignments

Pre 2010

Uploaded on 03/10/2009

koofers-user-mld
koofers-user-mld 🇺🇸

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS433: Computer Systems Organization Fall 2006
Homework 1
Assigned: 8/31
Due in class 9/14
Instructions: Please write an alias on your homework submissions for posting grades. We will use this
alias throughout the quarter. Homeworks are due in class on the date posted.
1. Assume that there is a new enhancement that adds a new execution mode called ”enhanced mode”
to the hardware. Enhanced mode makes code that can run in this mode 7 times faster than normal
execution.
(a) What percentage of a program must be run in enhanced mode must be used to achieve a speed
up of 2?
(b) For a given program, if enhanced mode is used 30% of the execution time, what is the speedup of
the program with the enhancement over the program without the enhancement? (Note that the
30% is the percentage of time spent in enhanced mode, not the percentage of the program that
can use enhanced mode).
2. Consider the following chart of instruction type frequencies.
Instruction Type Frequency Cycles
Loads 30% 2
Stores 10% 3
Branches 25% 1
ALU 35% 1
(a) Calculate the CPU time of a program assuming a 800MHz processor and 108instructions.
(b) Suppose a processor runs at 800MHz and an average program follows the above chart. What is
it the processor’s MIPS rating?
3. (10 Points) Consider the following code fragment.
loop: LW r1, 0(r2)
DADDI r1, r1 1
LW r3 0(r5)
SW r1, 0(r2)
DADD r2, r2 r3
DADDI r4, r4 -8
BNEZ r4, loop
Before the loop begins, the value of r4 is 184.
Assuming the system is the classic 5 stage integer pipeline RISC processor as discussed in class and
that memory accesses take 1 cycle, answer the following questions:
(a) How many times does this loop execute?
(b) Use a pipeline timing chart similar to Figure A.5 of the textbook to show the timing of the above
code fragment as it gets executed. Show only 1 iteration and the load for the following iteration.
Assume there is no bypassing/forwarding hardware, branches are resolved in the memory stage
and are handled by flushing the pipeline (aka let all the instructions currently in the pipeline
finish execution before loading the first instruction at the branch target) and that register writes
occur in the 1st half of the clock cycle and register reads occur on the 2nd half. Use IF, ID, EX,
MEM, WB to indicate which stage the instruction is in and use S to indicate stalls.
(c) Do the same thing as in part bbut this time assume there is bypassing/forwarding hardware.
(d) How many cycles does it take to execute the entire fragment in part b? in part c?
1
pf3

Partial preview of the text

Download Assignment 1 for Computer System Organization | CS 433 and more Assignments Computer Architecture and Organization in PDF only on Docsity!

CS433: Computer Systems Organization Fall 2006 Homework 1 Assigned: 8/ Due in class 9/

Instructions: Please write an alias on your homework submissions for posting grades. We will use this alias throughout the quarter. Homeworks are due in class on the date posted.

  1. Assume that there is a new enhancement that adds a new execution mode called ”enhanced mode” to the hardware. Enhanced mode makes code that can run in this mode 7 times faster than normal execution.

(a) What percentage of a program must be run in enhanced mode must be used to achieve a speed up of 2? (b) For a given program, if enhanced mode is used 30% of the execution time, what is the speedup of the program with the enhancement over the program without the enhancement? (Note that the 30% is the percentage of time spent in enhanced mode, not the percentage of the program that can use enhanced mode).

  1. Consider the following chart of instruction type frequencies. Instruction Type Frequency Cycles Loads 30% 2 Stores 10% 3 Branches 25% 1 ALU 35% 1

(a) Calculate the CPU time of a program assuming a 800MHz processor and 10^8 instructions. (b) Suppose a processor runs at 800MHz and an average program follows the above chart. What is it the processor’s MIPS rating?

  1. (10 Points) Consider the following code fragment. loop: LW r1, 0(r2) DADDI r1, r1 1 LW r3 0(r5) SW r1, 0(r2) DADD r2, r2 r DADDI r4, r4 - BNEZ r4, loop Before the loop begins, the value of r4 is 184. Assuming the system is the classic 5 stage integer pipeline RISC processor as discussed in class and that memory accesses take 1 cycle, answer the following questions:

(a) How many times does this loop execute? (b) Use a pipeline timing chart similar to Figure A.5 of the textbook to show the timing of the above code fragment as it gets executed. Show only 1 iteration and the load for the following iteration. Assume there is no bypassing/forwarding hardware, branches are resolved in the memory stage and are handled by flushing the pipeline (aka let all the instructions currently in the pipeline finish execution before loading the first instruction at the branch target) and that register writes occur in the 1st half of the clock cycle and register reads occur on the 2nd half. Use IF, ID, EX, MEM, WB to indicate which stage the instruction is in and use S to indicate stalls. (c) Do the same thing as in part b but this time assume there is bypassing/forwarding hardware. (d) How many cycles does it take to execute the entire fragment in part b? in part c?

  1. **** This question is for Graduate Students ****

Consider following the code fragment. loop: LW r1, 0(r2) DADDI r1, r1 1 LW r3 0(r5) SW r1, 0(r2) DADDI r4, r4 - BNEZ r4, loop DADD r2, r2 r Assume that before the loop begins, the value of r4 is 184 and the system is the classic 5 stage integer pipeline RISC processor as discussed in class and that memory accesses take 1 cycle. Assume there is full bypassing/forwarding hardware, branches are resolved in the memory stage and that register writes occur in the 1st half of the clock cycle and register reads occur on the 2nd half. In addition, branches are predicted as taken and there is 1 branch delay slot.

(a) Use a pipeline timing chart similar to Figure A.5 of the textbook to show the timing of the above code fragment as it gets executed. Show the 1st iteration. (b) In general, how does statically predicting the outcome of a branch improve performance? (c) What is the drawback of statically predicting taken without any additional modifications? How does the branch delay slot improve performance?

  1. Consider a pipeline with the following structure: IF ID EX MEM WB. Assume that the EX stage is 1 cycle long for all ALU operations, loads and stores. Also, the EX stage is 3 cycles long for the FP add, and 5 cycles long for the FP multiply. The pipeline supports full forwarding. All other stages in the pipeline take one cycle each. The branch is resolved in the ID stage. WAW hazards are resolved by stalling the later instruction. For the following code, list all the data hazards that cause stalls. Give a brief explanation why each hazard occurs. loop: L.D F0, 0(R1) L.D F2, 8(R1) L.D F4, 16(R1) MULT.D F8, F4, F ADD.D F6, F4, F ADD.D F8, F4, F S.D 24(R1), F S.D 24(R1), F ADD.D F4, F4, F SUBI R1, R1, 32 BNEZ R1, loop
  2. For these problems, we will explore a pipeline for a register-memory architecture. The architecture has two instruction formats: a register-register format and a register-memory format. In the register- memory format, one of the operands for an ALU instruction could come from memory. There is a single memory-addressing mode (offset + base register). The only non-branch register-memory instructions available have the format: Op Rdest, Rsrc1, Rsrc or Op Rdest, Rsrc1, MEM where Op is one of the following: Add, Subtract, And, Or, Load (in which case Rsrc1 is ignored), or Store. Rsrc1, Rsrc2, and Rdest are registers. MEM is a (base register, offset) pair. Branches compare two registers and, depending on the outcome of the comparison, move to a target address. The target address can be specified as a PC-relative offset or in a register (with no offset). Assume that the pipeline structure of the machine is as follows: IF RF ALU1 MEM ALU2 WB