Advanced Microprocessor Design Problem Set 2 for ECE 463 and ECE 521 - Prof. Eric Rotenber, Assignments of Electrical and Electronics Engineering

Problem set 2 for the advanced microprocessor design courses ece 463 and ece 521. The problem set includes five problems related to microprocessor design, cache systems, and memory hierarchy. Students are required to solve all problems, even the non-graded ones, to maximize their scores on the graded problems. The problems cover topics such as cache miss penalties, tlb lookup and cache access, and memory hierarchy design.

Typology: Assignments

Pre 2010

Uploaded on 03/10/2009

koofers-user-o2b
koofers-user-o2b 🇺🇸

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
–1–
ECE 463: Advanced Microprocessor Design
ECE 521: Computer Design and Technology
Problem Set 2
Due Wednesday, October 9, 2002
Problems 2 and 4 will be graded. There are 55 points on these problems
. Note: You must do all
the problems, even the non-graded ones
. If you do not do some of them, half as many points as
they are worth will be subtracted from your score on the graded problems.
Problem 1.
(15 points)
You have designed a system with split instruction and data caches. The
data cache uses writeback, and 50% of all replaced blocks are dirty. The miss rate for instructions
is 1%, and the miss rate for data is 2%. Your measurements have shown that 15% of instructions
are loads and 5% are stores, and that the average CPI (of all instructions) is 1.3 assuming no
memory stalls (a "perfect" memory system). If the miss penalty for reads and writes to main
memory is 25 cycles, what is the overall CPI including memory stalls?
Problem 2.
(20 points)
As we have seen in class, in order to perform TLB lookup at the same
time a set-associative cache is beginning to be searched, some restrictions on the length of the
displacement field are necessary. Assume we desire to do TLB lookup concurrently with cache
search, and answer the following questions.
(a) If both the (main-memory) page size and the (cache) line size are held fixed, how does an
increase in cache size (the number of lines in the cache) affect the number of lines required in
each set (the “set size”)? Justify your answer.
(b) If the cache contains 16K words, pages are 1K words long, and lines contain 16 words, what
range of set sizes will allow simultaneous TLB and cache access?
(c) Does the required set size depend on the line size? (For example, if you redid part (b)
assuming a line size of 32 words, or 64 words, etc., would your answer change?)
(d) Suppose now that the set size and line size are held fixed. How does an increase in cache
size affect the required page size? Justify your answer.
Problem 3.
(15 points)
You have measured the “ideal” CPI of a program to be 0.75
(superscalar issue allows CPIs to be below 1). However, this doesn’t include the memory system.
Assume that cache hits cost nothing and that cache misses cost 50 cycles. If instructions miss 2%
of the time and data references miss 5% of the time, what is the overall average CPI? Assume the
instruction frequencies shown below.
Instruction type Frequency Clock-cycle count
ALU ops 43% 1
Loads 21% 2
Stores 12% 2
Branches 24% 2
pf2

Partial preview of the text

Download Advanced Microprocessor Design Problem Set 2 for ECE 463 and ECE 521 - Prof. Eric Rotenber and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

ECE 463: Advanced Microprocessor Design

ECE 521: Computer Design and Technology

Problem Set 2

Due Wednesday, October 9, 2002

Problems 2 and 4 will be graded. There are 55 points on these problems. Note: You must do all the problems, even the non-graded ones. If you do not do some of them, half as many points as they are worth will be subtracted from your score on the graded problems.

Problem 1. (15 points) You have designed a system with split instruction and data caches. The data cache uses writeback, and 50% of all replaced blocks are dirty. The miss rate for instructions is 1%, and the miss rate for data is 2%. Your measurements have shown that 15% of instructions are loads and 5% are stores, and that the average CPI (of all instructions) is 1.3 assuming no memory stalls (a "perfect" memory system). If the miss penalty for reads and writes to main memory is 25 cycles, what is the overall CPI including memory stalls?

Problem 2. (20 points) As we have seen in class, in order to perform TLB lookup at the same time a set-associative cache is beginning to be searched, some restrictions on the length of the displacement field are necessary. Assume we desire to do TLB lookup concurrently with cache search, and answer the following questions.

(a) If both the (main-memory) page size and the (cache) line size are held fixed, how does an increase in cache size (the number of lines in the cache) affect the number of lines required in each set (the “set size”)? Justify your answer.

(b) If the cache contains 16K words, pages are 1K words long, and lines contain 16 words, what range of set sizes will allow simultaneous TLB and cache access?

(c) Does the required set size depend on the line size? (For example, if you redid part (b) assuming a line size of 32 words, or 64 words, etc., would your answer change?)

(d) Suppose now that the set size and line size are held fixed. How does an increase in cache size affect the required page size? Justify your answer.

Problem 3. (15 points) You have measured the “ideal” CPI of a program to be 0. (superscalar issue allows CPIs to be below 1). However, this doesn’t include the memory system. Assume that cache hits cost nothing and that cache misses cost 50 cycles. If instructions miss 2% of the time and data references miss 5% of the time, what is the overall average CPI? Assume the instruction frequencies shown below.

Instruction type Frequency Clock-cycle count

ALU ops 43% 1 Loads 21% 2 Stores 12% 2 Branches 24% 2

Problem 4. (35 points) Compare 0-, 1-, 2-, and 3-address machines by writing programs to compute

g := (a +b xc ) ÷ (d -e xf )

for each of the four machines. The instructions available for use are as follows:

0-address 1-address 2-address 3-address

PUSH M LOAD M MOV (X := Y) MOV (X := Y) POP M STORE M ADD (X := X + Y) ADD (X := Y + Z) ADD ADD M SUB (X := X - Y) SUB (X := Y - Z) SUB SUB M MUL (X := X * Y) MUL (X := Y * Z) MUL MUL M DIV (X := X / Y) DIV (X := Y / Z) DIV DIV M

(a) M is a 16-bit memory address, and X, Y, and Z are either 16-bit addresses or 4-bit register numbers. The zero-address machine uses a stack, the 1-address machine uses an accumulator, and the other two have 16 registers and instructions operating on all combinations of memory locations and registers. SUB X, Y subtracts Y from X and SUB X, Y, Z subtracts Z from Y and puts the result in X. Assuming 8-bit opcodes and instruction lengths that are multiples of 4 bits, how many bits does each machine need to computeg? (All of the variablesa,b,c,d,e, andf are initially stored in main memory, not registers, andg must be stored back to memory.)

(b) Show that for each machine, there is some program that it can represent more compactly than all the other three machines. That is, display one program for which the zero-address architecture “beats” all the others, one program on which the one-address architecture is best, and so forth. Show the code for these programs in all four machines. In the event that it is not possible to find different programs for which each of the machines are optimal, explain how you have arrived at this conclusion.Hint : One of the architectures cannot be optimal for a programP unless one of the other architectures “runs out of registers” in which to store temporary values generated during the execution ofP. Which architectures are these? Why?

Problem 5. (15 points) [Hennessy & Patterson 5.13] McFarling [1989] found that the best

memory-hierarchy performance occurred when it was possible to prevent some instructions from entering the cache.

(a) <5.5> Explain why McFarling’s result could be true.

(b) <5.2, 5.5> The four memory-hierarchy questions (Section 5.2) form a model for describing cache designs. Where does a chace that does not alwaysread-allocate fit into this model?