Practice Exam 2 - System Architecture | CMSC 411, Exams of Computer Science

Material Type: Exam; Class: SYSTM ARCHITECTURE; Subject: Computer Science; University: University of Maryland; Term: Spring 2009;

Typology: Exams

Pre 2010

Uploaded on 07/29/2009

koofers-user-f03-1
koofers-user-f03-1 🇺🇸

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CMSC 411 Practice Exam 2
General instructions.
Be complete, yet concise. You may leave arithmetic expressions in any form that a calculator could evaluate.
1. Cache
(a) Suppose we have a byte addressable memory of size 4GB(232 bytes) with a cache of size 256KB
(218 bytes), not including tag bits. Also suppose the cache block size is 32 bytes. For each of
the following cache organizations, compute the length in number of bits for the tag, index and
offset fields of the 32-bit memory address (show your calculations):
i. direct mapped
ii. 2-way set associative
iii. 8-way set associative
iv. fully associative
(b) Given the following parameters, which performs better in terms of average memory access time,
the direct mapped or the 2-way set associative cache? Assume a cache hit takes 1 cycle for either
organization, and the cache miss penalty is 20 nanoseconds (both use the same memory system).
Justify your answer.
Direct mapped - miss rate 10%, 1GHz clock
2-way set associative - miss rate 5%, 800MHz clock
(c) Consider adding a fast second-level cache to a computer with only a single level cache. Suppose
data can be accessed from the second level cache 12 times faster than it can be accessed from
the original cache, and that the second level cache can be used 30% of the time. How much
speedup can we gain by adding the second-level cache? Assume all data accesses hit in the
original cache.
pf3
pf4

Partial preview of the text

Download Practice Exam 2 - System Architecture | CMSC 411 and more Exams Computer Science in PDF only on Docsity!

CMSC 411 Practice Exam 2

General instructions. Be complete, yet concise. You may leave arithmetic expressions in any form that a calculator could evaluate.

  1. Cache

(a) Suppose we have a byte addressable memory of size 4GB( 232 bytes) with a cache of size 256KB ( 218 bytes), not including tag bits. Also suppose the cache block size is 32 bytes. For each of the following cache organizations, compute the length in number of bits for the tag, index and offset fields of the 32-bit memory address (show your calculations): i. direct mapped ii. 2-way set associative iii. 8-way set associative iv. fully associative (b) Given the following parameters, which performs better in terms of average memory access time, the direct mapped or the 2-way set associative cache? Assume a cache hit takes 1 cycle for either organization, and the cache miss penalty is 20 nanoseconds (both use the same memory system). Justify your answer. Direct mapped - miss rate 10%, 1GHz clock 2-way set associative - miss rate 5%, 800MHz clock

(c) Consider adding a fast second-level cache to a computer with only a single level cache. Suppose data can be accessed from the second level cache 12 times faster than it can be accessed from the original cache, and that the second level cache can be used 30% of the time. How much speedup can we gain by adding the second-level cache? Assume all data accesses hit in the original cache.

  1. Branch prediction Consider the following code fragment:

if( d == 1 ) d = 2; if( d == 2 ) d = 3; if( d == 3) d = 4;

A typical code sequence generated for this fragment, assuming that d is assigned to R1, looks like the following:

DSUBUI R2, R1, # BNEZ R2, L1 ; Branch B DADDIU R1, R0, # L1: DSUBUI R3, R1, # BNEZ R3, L2 ; Branch B DADDIU R1, R0, # L2: DSUBUI R4, R1, # BNEZ R4, L3 ; Branch B DADDIU R1, R0, #

... L3:

For the given program fragment, construct a (1, 1) predictor table, given that the code executes inside a loop, with the value of d alternating between 2 and 0. Fill in the table as given below. Circle the prediction bit used for each branch executed. Assume that all predictors are initialized to not taken, and that the correlation bit is initially set to not taken. Finally, calculate the total number of mispredicted branches.

d=? B1 B1 New B1 B2 B2 New B2 B3 B3 New B prediction action prediction prediction action prediction prediction action prediction 2

0

2

0

Iteration Instructions Issues at Executes at Write CDB Comment number cycle # cycle at (stalls)

How many clock cycles does each iteration take?