Computer Architecture Exam - ECS 154B, Fall 2003, Exams of Computer Architecture and Organization

The final exam for a computer architecture course, ecs 154b, held in fall 2003. The exam covers topics such as pipeline hazards, data forwarding, and page replacement strategies. Students are required to show all their work and answer questions related to these topics.

Typology: Exams

Pre 2010

Uploaded on 07/31/2009

koofers-user-h28
koofers-user-h28 🇺🇸

10 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Computer Architecture
ECS 154B
Fall 2003
Final Exam
Total: 100 points
Time: 120 minutes
Open book and open notes
Sat., Dec. 13, 2003
Name: ANSWERS
Student Id:
Please SHOW ALL YOUR WORK!
1
pf3
pf4
pf5

Partial preview of the text

Download Computer Architecture Exam - ECS 154B, Fall 2003 and more Exams Computer Architecture and Organization in PDF only on Docsity!

Computer Architecture

ECS 154B

Fall 2003

Final Exam

Total: 100 points

Time: 120 minutes

Open book and open notes

Sat., Dec. 13, 2003

Name: ANSWERS

Student Id:

Please SHOW ALL YOUR WORK!

A. (12 points) Refer to Fig. 1. Assume in-order execution (no dynamic instruction scheduling). Which are the EARLIEST pipeline registers (1, 2, ..., 20) in the figure to be compared in order to detect each of the following hazards? In ALU operations like “add” and “sub”, assume:

  • Argument 1 is WriteRegister
  • Argument 2 is ReadRegister1 (goes to upper ALU input)
  • Argument 3 is ReadRegister2 (goes to lower ALU input)

There could be more than one hazard per code block below: hazard 1: sub $2, $1, $ sub $1, $3, $ add $4, $3, $ add $5, $2, $

Earliest Comparision: Check (4 == 15) or (10 == 20) hazard 2: lw $5, 10($3) add $3, $5, $ sub $2, $2, $

Earliest Comparision: Check (3 == 11) or (9 == 15) or (13 == 20) hazard 3: sub $2, $2, $ add $4, $3, $ lw $1, 20($3)

Earliest Comparision: Check (4 == 11) or (10 == 15) or (14 == 20)

Consider two possible page-replacement strategies: LRU(the least recently used page is replaced) and FIFO (the page that has been in the memory longest is replaced). The merit of a page-replacement strategy is judged by its hit ratio. Assume that, after space has been reserved for the page table, the interrupt service routines, and the operating system kernel, there is only sufficient room left in the main memory for four user-program pages. Assume also that initially virtual pages 1, 2, 3, and 4 of the user program are brought into physical memory in that order.

  1. For each of the two strategies, what pages will be in the memory at the end of the following sequence of virtual page accesses? Read the sequence from left to right: (6, 3, 2, 8, 4). LRU: start: 1 2 3 4 access 6: replace 1 => 2 3 4 6 access 3: reorder list => 2 4 6 3 access 2: reorder list => 4 6 3 2 access 8: replace 4 => 6 3 2 8 access 4: replace 6 => 3 2 4 8

FIFO: start: 1 2 3 4 access 6: replace 1 => 2 3 4 6 access 3: no change => 2 3 4 6 access 2: no change => 2 3 4 6 access 8: replace 2 => 3 4 6 8 access 4: no change => 3 4 6 8

  1. Which (if either) replacement strategy will work best when the machine accesses pages in the following (stack) order: (3, 4, 5, 6, 7, 6, 5, 4, 3, 4, 5, 6, 7, 6, ...)? LRU misses on pages 3 and 7 => 2/8 miss rate. FIFO doesn’t work well on stack accesses = > 5/8 miss rate.
  2. Which (if either) replacement strategy will work best when the machine accesses pages in the following (repeated sequence) order: (3, 4, 5, 6, 7, 3, 4, 5, 6, 7, ...)? Both strategies have a 100% miss rate in the steady state.
  3. Which (if either) replacement strategy will work best when the machine accesses pages in a randomly selected order, such as (3, 4, 2, 8, 7, 2, 5, 6, 3, 4, 8, ...)? Neither FIFO nor LRU is guaranteed to be the better strategy in dealing with random accesses since there is no locality to the reference stream.

Program A consists of 1000 consecutive ADD instructions, while program B consists of a loop that executes a single ADD instruction 1000 times. You run both programs on a certain machine and find that program B consistently executes faster. Give two plausible explanations. #1: One would expect the loop to achieve a higher hit rate in the cache because it involves many fewer instruction words. #2: the loop, occupying many fewer instruction words, should all fit onto a single page. The 1000 instructions might span several pages and hence their execution may involve some page faults.

In this problem, we will compare the following three networks for a 64 processor multiprocessor. i) a 8X8 2-dimensional torus. ii) a 6-cube. ii) a radix-2, 6-stage butterfly.

Assume that a link has bandwidth 100 Mbps in one direction. Do not count the links connecting the processor to the switch in the calculations below: SHOW ALL YOUR WORK! Calculate and write down each metric for each network. Draw the pictures.

  1. (6 points) Which network has the lowest diameter? Assume unidirectional links and dimension-order routing for the torus. diameter of (i) is 14, (ii) is 6, (iii) is 5.
  2. (6 points) Which network has the highest total bandwidth? (i) (2 links/switch * 64 switches) * 100 Mbps = 12800 Mbps. (ii) (6 links/switch * 64 switches) * 100 Mbps = 38400 Mbps. (iii) (64 links per stage * 6 stages * 100 Mbps) = 38400 Mbps.
  3. (6 points) Which network has the highest bisection bandwidth? (i) 2*8 links * 100 Mbps = 1600 Mbps. (ii) 64/2 links * 100 Mbps = 3200 Mbps. (iii) 64/2 links * 100 Mbps = 3200 Mbps.
  4. (6 points) Which network uses the least number of switches? (i) 64 switches (ii) 64 switches (iii) 32 switches/stage * 6 stages = 192 switches.
  5. (6 points) If a 2X2 crossbar switch costs $5 and cost is proportional to the square of the number of inputs, what is the cost of each network and which is most expensive? (i) 64 3X3 switches) * ( 3

2 22 )(5) =^ $^720 (ii) 64 7X7 switches) * ( 7

2 22 )(5) =^ $^3920 (i) (192 2X2 switches)(5) = $960.