Final Exam Practice - Advanced Computer Architecture | ECE 4100, Exams of Computer Architecture and Organization

Material Type: Exam; Class: Adv Computer Architecure; Subject: Electrical & Computer Engr; University: Georgia Institute of Technology-Main Campus; Term: Summer 2003;

Typology: Exams

Pre 2010

Uploaded on 08/05/2009

koofers-user-hv0
koofers-user-hv0 🇺🇸

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
SCORE:________ Name:__________________________________________
ECE 4100 Advanced Computer Architecture
Final Exam – Summer 2003
1. (10 points) What is the significance of a “snooping protocol” and what is it used for
in modern computer systems?
2. (10 points) What limits the size and complexity of the L1 cache and why is an L2
common today?
3. (5 points) What are the advantages and disadvantages of FLASH memory over Disk?
pf3
pf4
pf5

Partial preview of the text

Download Final Exam Practice - Advanced Computer Architecture | ECE 4100 and more Exams Computer Architecture and Organization in PDF only on Docsity!

SCORE:________ Name:__________________________________________

ECE 4100 Advanced Computer Architecture

Final Exam – Summer 2003

  1. (10 points) What is the significance of a “snooping protocol” and what is it used for in modern computer systems?
  2. (10 points) What limits the size and complexity of the L1 cache and why is an L common today?
  3. (5 points) What are the advantages and disadvantages of FLASH memory over Disk?
  1. (10 points) Adding 3 additional processors to a parallel computer system runs an application 3.5 times faster than on one processor (ignore additional communication overheard between processors and any possibility of superlinear speedup effects). What percentage of the original program code would have to be able to run in parallel on four processors for this to be possible? (assume it only runs parallel on four or sequential on one) Percentage of code that can run in parallel _____%
  2. (10 points) Assume a network with a 100M bit/sec bandwidth has a sending overhead of 50 usec. and a receiving overhead of 75 usec. How long would it take to send a 1Mbyte message. Assume the machines are 5000 km apart and use the book’s speed of light in a conductor estimate (2/3 of the speed in a vacuum). Compute the total latency for the message (to four decimal places). Total latency for message _____________________
  3. (10 points) Compute (to four decimal places) the average time to read or write a 4096- byte sector on a disk with these features: Average seek time is 8ms (use the book’s suggested 1/3 correction factor for a more realistic seek time) a transfer rate of 20 Mbytes/sec, the disk rotates at 15,000 RPM, and the controller overhead is .1ms Average time to read or write a sector is ___________ms
  1. (15 points) Part I (8 of 15 points): Unroll the loop shown below three times to reduce the number of stalls and control overhead. You can assume that the loop executes a multiple of three times. Use registers F10..F30, if needed. Indicate any stalls in your answer. Instruction producing result Instruction using result Latency in clock cycles FP ALU Op FP ALU Op 3 FP ALU Op Store Double 2 Load Double FP ALU Op; 1 Load Double Store Double 0 LOOP: L.D F8, 0(R1) L.D F4, 0(R2) ADD.D F6, F4, F SUB.D F4, F6, F S.D F4, 0(R1) DADDIU R1, R1, # BNE R1, R3 LOOP Part II (7of 15 points) : Using the code example above (with the same number of loop un-rollings as a basis), use software pipelining to minimize stalls. Startup and cleanup code is not required. Indicate any stalls in your answer. Note: Do not unroll more than 3 times for the pipeline code.
  1. (10 points) Consider the program segment below running on a single-issue machine using Tomasulo’s Algorithm. Fill in the clock cycle number in the table below assuming the latencies shown in the table below the program. L.D F4, 0(R1) L.D F6, 0(R2) MUL.D F2, F6, F SUB.D F4, F4, F DIV.D F4, F6, F ADD.D F8, F8, F DADDIU R1, R1, # - BNE R1, R3 LOOP Details of Functional Units: Unit Latency (in Execute) Reservation Stations FP Add/Sub 5 2 FP Mult 8 2 FP Div 12 2 FP Load/Store 2 2 each Load/Store buffers Integer Unit 1 None Note: The FP arithmetic units are NOT pipelined – i.e. you must wait for the current operation to finish execution before using the unit again. You can do a new FP load/store every clock cycle. The WB stage takes only 1 clock cycle and during that cycle the new data appears on the CDB and at all reservation stations. Instruction Issue Execute WB L.D F4, 0(R1) 0 L.D F6, 0(R2) MULT F2, F6, F SUB.D F4, F4, F DIV.D F4, F6, F ADD.D F8, F8, F DADDIU R1, R1, #- BNE R1, R3 LOOP