

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Last year's midterm problem set for a computer architecture course focusing on the mckinley implementation of ia64. The problem set includes questions about data and control forwarding, pipeline diagrams, data stalls, branch costs, instruction issue rate, and cache misses. Students are required to analyze various instructions and their execution in the pipeline, identify data stalls, and calculate average branch costs.
Typology: Assignments
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Problem Set 2
Here’s part of last year’s midterm.
(a) (10 points) What data and control forwarding is needed? Use an IF1 IS
... diagram to show several instructions executing and label and comment on each arrow. Assume single instruction issue. (b) (10 points) Make a symbolic pipeline diagram by showing the IF... RF stages (with latches in between), and a split into two possible pipelines: EX... WB and FP1... FWB. Draw forwarding paths between the stages for data and control forwarding. (c) (10 points) What data stalls remain? Provide a neat list of the form “Instructions X, Y, and Z with result R1 followed by W or X with source R1: 2 intervening clock cycles.” (Or more compactly, {X,Y,Z} followed by {W,X}: 2.) (d) (10 points) Control hazards. Assume 10% of instructions are branches, 2% are jumps and calls, and 2% are returns (jr). The McKinley has a Branch Target Buffer (BTB) and a Branch Prediction Table (BPT). The BPT is 90% accurate. Assume that both correctly and incorrectly predicted branches are 60% taken. The Branch Target Buffer has a hit rate of 70%, regardless of the accuracy of the prediction. What are the costs in clocks for the various branch possibilities (ie, presense or absense in BTB, correct or incorrect BPT, taken or not taken)? What is the average cost in clocks for a control statement? (e) (10 points) McKinley tries to issue 6 instructions per clock cycle (ideal IPC = 6). What is the reduction in IPC considering only control hazards? Next, assume that of the remaining 86% of instructions, only 50% of the peak issue rates can be achieved (roughly based on Figure 4.57) due to structural and data hazard considerations. What is the overall IPC of the machine?
(f) (10 points) If a cache miss happens, an additional 8 cycles are required to ac- cess the on-chip 256K L2 cache. The L1 caches are 16K. If L2 fits all memory necessary for a program, and the instruction cache miss rate is 0.4%, and the data cache miss rate is 6% (estimated from table 5.7 for a 16K cache), what is IPC considering cache misses? Assume that instruction cache misses fully stall the pipeline, but that there are sufficient reservation stations so that other instructions may execute during a data cache miss. (g) (10 points) Consider the merging of DET and WB. What would the implications of this be? What issues would come up in deciding whether or not to merge the two stages? (h) (10 points) Why is instruction issue rate insufficient for comparing different pro- cessors? (i) (Extra credit) Design a VLSI layout for this architecture.
Consider a Tomosulo pipeline (similar to PowerPC 620 in text) with stages IF, ID, IS (the process of moving instructions to reservation stations), EX (1 cycle integer, 2-cycle LSU, 3-cycle FP mult or add), and WB (commit). There is one integer unit with two reservation stations (I1, I2), 1 FP unit with two reservation stations (F1, F2), 1 Load/Store unit with two reservation stations (LS1, LS2), and one Branch unit with 2 reservation stations (B1, B2) that takes 1 cycle. Indicate the clock cycle for each of the following instructions would be in the various stages. Assume 1 instruction issue per cycle and serparate FP and integer result busses. Assume that In the RS# column indicate the reservation station slot (e.g., F1, I2, B1) that was used. For an instruction already in a reservation station, EX1 can commence during the same cycle as the common data bus write in WB. EX2 and EX3 are not used for all instructions — leave blank for those that do not need them. A new instruction can be loaded into a reservation station when the old one is in WB — WB and IS can overlap.