Microprocessor Design: Pipeline Stalls, Cache Implementations, and Performance Analysis, Assignments of Electrical and Electronics Engineering

Problem sets from advanced microprocessor design and computer architecture courses, focusing on pipeline stalls, cache implementations, and performance analysis. Students are required to calculate the impact of pipeline stalls on average instruction times, analyze the effect of branch prediction, and determine the overall speedup of enhancements. Additionally, they must calculate the number of chips that can be produced from a wafer, determine the break-even point for selling chips, and analyze cache miss rates and speeds.

Typology: Assignments

Pre 2010

Uploaded on 03/18/2009

koofers-user-qvr
koofers-user-qvr 🇺🇸

9 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
–1–
ECE 463: Advanced Microprocessor Design
ECE 521: Computer Design and Technology
Problem Set 1
Monday, January 31, 2006
Problems 3 and 4 will be graded. There are 55 points on these problems
. Note: You must do all
the problems, even the non-graded ones
. If you do not do some of them, half as many points as
they are worth will be subtracted from your score on the graded problems .
Problem 1. A pipelined machine incurs average stalls (due to hazards) as given in the table
below for each of the instructions:
Instruction type
Hazard type
Average stall
Loads
Data
0.5
Branches
Control
2
FP mult
Data
3
FP add/subtract
Data
1
FP div
Data
10
Assume that in a particular benchmark program the frequency of ins tructions
that cause pipeline stalls are as shown below:
Instruction type
Frequency
Loads
17%
Branches
14%
FP mult
2%
FP add/subtract
7%
FP div
3%
Assume that the ideal CPI without pipe line stalls is 1.5.
(a)
(3 points)
How can the average time for loads be less than one cycle?
(b)
(3 points)
Why might the average time per stall be different for different kinds of instructions?
(c)
(8 points)
How much faster is the ideal pipelined machine versus the machine with
these stalls?
(d)
(6 points)
Suppose that branch prediction is added, with an 80% prediction accuracy. How
much faster is the machine with branch prediction versus the original machine?
Problem 2.
(20 points)
Two enhancements with the following speedups are possible:
Speedup1 = 25
Speedup2 = 50
Only one enhancement is usable at a time.
(a) If Speedup1 is used 25% of the time, what is the overall speedup?
pf3

Partial preview of the text

Download Microprocessor Design: Pipeline Stalls, Cache Implementations, and Performance Analysis and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

ECE 463: Advanced Microprocessor Design

ECE 521: Computer Design and Technology

Problem Set 1

Monday, January 31, 2006

Problems 3 and 4 will be graded. There are 55 points on these problems. Note: You must do all the problems, even the non-graded ones. If you do not do some of them, half as many points as they are worth will be subtracted from your score on the graded problems. Problem 1. A pipelined machine incurs average stalls (due to hazards) as given in the table below for each of the instructions: Instruction type Hazard type Average stall Loads Data 0. Branches Control 2 FP mult Data 3 FP add/subtract Data 1 FP div Data 10 Assume that in a particular benchmark program the frequency of instructions that cause pipeline stalls are as shown below: Instruction type Frequency Loads 17% Branches 14% FP mult 2% FP add/subtract 7% FP div 3% Assume that the ideal CPI without pipeline stalls is 1.5. (a) (3 points) How can the average time for loads be less than one cycle? (b) (3 points) Why might the average time per stall be different for different kinds of instructions? (c) (8 points) How much faster is the ideal pipelined machine versus the machine with these stalls? (d) (6 points) Suppose that branch prediction is added, with an 80% prediction accuracy. How much faster is the machine with branch prediction versus the original machine? Problem 2. (20 points) Two enhancements with the following speedups are possible: Speedup 1 = 25 Speedup 2 = 50 Only one enhancement is usable at a time. (a) If Speedup 1 is used 25% of the time, what is the overall speedup?

(b) What percentage of the time must Speedup 2 be used in order to match this speedup? (c) What is the overall speedup when both the enhancements of parts (a) and (b) are implemented, for the fraction of time determined in parts (a) and (b)? (d) Now considering any speedup, if neither can be used for 20% of the time, what is the maximum speedup possible? Problem 3. (25 points) Expensive Computers, Inc. just finished the design of their newest generation microprocessor for a cost of $300 million. The die size is 15x15 mm^2 , and it will be fabricated in a new 8” IC fabrication plant that cost $1.2 billion to build. (a) How many chips (dies) will fit on an 8” diameter wafer using the approximation that compensates for the number of dies along the edge? (Because this is only an approximation rounding up is acceptable.) (b) Labor and supplies cost $10,000 per wafer produced, and the yield is 60% (fraction of fully functioning chips). Assuming each chip sells for $500, how many (functioning) chips have to be sold to break even? Factor in design cost, fabrication cost, yield, and the cost for the new fabrication plant. Note that cost of manufacturing each chip is not included in the “startup cost.” However it includes the cost for designing the microprocessor and fabrication plant. (c) How many wafers must be processed to get these chips? (d) If the yield is improved from 60% to 70%, and the same number of wafers are fabricated as calculated in part (c.), what is the profit for Expensive Computers, Inc.? (e) How many chips (assuming 70% yield) must be produced to generate enough revenue to pay for the design cost of $500 million for the next generation microprocessor? Problem 4. Two programs are run on two computers, which differ only in the processor cache implementation. Use the information in the table below to answer the following questions. Computer A, Program 1 Computer A, Program 2 Computer B, Program 1 Computer B, Program 2 Memory References 1000 2000 1000 2000 L1 Access Time (cycles)

L1 Misses 75 150 20 50

L2 Access Time (cycles) 8 8 8 8

L2 Misses 11 18 14 23

L2 Miss Penalty (cycles) 200 200 200 200 (a) (8 points) What are the local miss rates for each cache level on each program? What about the global miss rates?