

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Problem sets from advanced microprocessor design and computer architecture courses, focusing on pipeline stalls, cache implementations, and performance analysis. Students are required to calculate the impact of pipeline stalls on average instruction times, analyze the effect of branch prediction, and determine the overall speedup of enhancements. Additionally, they must calculate the number of chips that can be produced from a wafer, determine the break-even point for selling chips, and analyze cache miss rates and speeds.
Typology: Assignments
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Problems 3 and 4 will be graded. There are 55 points on these problems. Note: You must do all the problems, even the non-graded ones. If you do not do some of them, half as many points as they are worth will be subtracted from your score on the graded problems. Problem 1. A pipelined machine incurs average stalls (due to hazards) as given in the table below for each of the instructions: Instruction type Hazard type Average stall Loads Data 0. Branches Control 2 FP mult Data 3 FP add/subtract Data 1 FP div Data 10 Assume that in a particular benchmark program the frequency of instructions that cause pipeline stalls are as shown below: Instruction type Frequency Loads 17% Branches 14% FP mult 2% FP add/subtract 7% FP div 3% Assume that the ideal CPI without pipeline stalls is 1.5. (a) (3 points) How can the average time for loads be less than one cycle? (b) (3 points) Why might the average time per stall be different for different kinds of instructions? (c) (8 points) How much faster is the ideal pipelined machine versus the machine with these stalls? (d) (6 points) Suppose that branch prediction is added, with an 80% prediction accuracy. How much faster is the machine with branch prediction versus the original machine? Problem 2. (20 points) Two enhancements with the following speedups are possible: Speedup 1 = 25 Speedup 2 = 50 Only one enhancement is usable at a time. (a) If Speedup 1 is used 25% of the time, what is the overall speedup?
(b) What percentage of the time must Speedup 2 be used in order to match this speedup? (c) What is the overall speedup when both the enhancements of parts (a) and (b) are implemented, for the fraction of time determined in parts (a) and (b)? (d) Now considering any speedup, if neither can be used for 20% of the time, what is the maximum speedup possible? Problem 3. (25 points) Expensive Computers, Inc. just finished the design of their newest generation microprocessor for a cost of $300 million. The die size is 15x15 mm^2 , and it will be fabricated in a new 8” IC fabrication plant that cost $1.2 billion to build. (a) How many chips (dies) will fit on an 8” diameter wafer using the approximation that compensates for the number of dies along the edge? (Because this is only an approximation rounding up is acceptable.) (b) Labor and supplies cost $10,000 per wafer produced, and the yield is 60% (fraction of fully functioning chips). Assuming each chip sells for $500, how many (functioning) chips have to be sold to break even? Factor in design cost, fabrication cost, yield, and the cost for the new fabrication plant. Note that cost of manufacturing each chip is not included in the “startup cost.” However it includes the cost for designing the microprocessor and fabrication plant. (c) How many wafers must be processed to get these chips? (d) If the yield is improved from 60% to 70%, and the same number of wafers are fabricated as calculated in part (c.), what is the profit for Expensive Computers, Inc.? (e) How many chips (assuming 70% yield) must be produced to generate enough revenue to pay for the design cost of $500 million for the next generation microprocessor? Problem 4. Two programs are run on two computers, which differ only in the processor cache implementation. Use the information in the table below to answer the following questions. Computer A, Program 1 Computer A, Program 2 Computer B, Program 1 Computer B, Program 2 Memory References 1000 2000 1000 2000 L1 Access Time (cycles)
L2 Access Time (cycles) 8 8 8 8
L2 Miss Penalty (cycles) 200 200 200 200 (a) (8 points) What are the local miss rates for each cache level on each program? What about the global miss rates?