Computer Architecture Problem Set 2: McKinley IA64 Pipeline Analysis, Assignments of Computer Architecture and Organization

Last year's midterm problem set for a computer architecture course focusing on the mckinley implementation of ia64. The problem set includes questions about data and control forwarding, pipeline diagrams, data stalls, branch costs, instruction issue rate, and cache misses. Students are required to analyze various instructions and their execution in the pipeline, identify data stalls, and calculate average branch costs.

Typology: Assignments

Pre 2010

Uploaded on 09/17/2009

koofers-user-5df
koofers-user-5df 🇺🇸

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CE202 Computer Architecture
Problem Set 2
Here’s part of last year’s midterm.
1. (80 points) The McKinley implementation of IA64 has an 8-stage core (integer)
pipeline. The phases can approximately be called Instruction Fetch (IF), Instruc-
tion Issue 1 (IS1), IS2, IS3, Register Read (RF), Execute and L1 cache access (EXE),
Exception Detect and Branch Correction (DET), and Write Back (WB). IS1, IS2, and
IS3 are concerned with analyzing a packet of intructions for issue, and register renam-
ing. Branch targets are calculated in IS1. Branches are resolved in DET. L1 cache
access, instruction or data, takes a single cycle. After the REG cycle, floating-point
instructions have stages FP1, FP2, FP3, FP4, and FWB, replacing EX, DET, and
WB.
Consider the instructions Jump (J), Jump Register (JR), Branch, Integer, FP, Load,
and Store.
(a) (10 points) What data and control forwarding is needed? Use an IF1 IS1
. . . diagram to show several instructions executing and label and comment on
each arrow. Assume single instruction issue.
(b) (10 points) Make a symbolic pipeline diagram by showing the IF.. . RF stages
(with latches in between), and a split into two possible pipelines: EX. . . WB and
FP1. . . FWB. Draw forwarding paths between the stages for data and control
forwarding.
(c) (10 points) What data stalls remain? Provide a neat list of the form “Instructions
X, Y, and Z with result R1 followed by W or X with source R1: 2 intervening
clock cycles.” (Or more compactly, {X,Y,Z}followed by {W,X}: 2.)
(d) (10 points) Control hazards. Assume 10% of instructions are branches, 2% are
jumps and calls, and 2% are returns (jr). The McKinley has a Branch Target
Buffer (BTB) and a Branch Prediction Table (BPT). The BPT is 90% accurate.
Assume that both correctly and incorrectly predicted branches are 60% taken.
The Branch Target Buffer has a hit rate of 70%, regardless of the accuracy of
the prediction. What are the costs in clocks for the various branch possibilities
(ie, presense or absense in BTB, correct or incorrect BPT, taken or not taken)?
What is the average cost in clocks for a control statement?
(e) (10 points) McKinley tries to issue 6 instructions per clock cycle (ideal IPC = 6).
What is the reduction in IPC considering only control hazards? Next, assume
that of the remaining 86% of instructions, only 50% of the peak issue rates can
be achieved (roughly based on Figure 4.57) due to structural and data hazard
considerations. What is the overall IPC of the machine?
1
pf3

Partial preview of the text

Download Computer Architecture Problem Set 2: McKinley IA64 Pipeline Analysis and more Assignments Computer Architecture and Organization in PDF only on Docsity!

CE202 – Computer Architecture

Problem Set 2

Here’s part of last year’s midterm.

  1. (80 points) The McKinley implementation of IA64 has an 8-stage core (integer) pipeline. The phases can approximately be called Instruction Fetch (IF), Instruc- tion Issue 1 (IS1), IS2, IS3, Register Read (RF), Execute and L1 cache access (EXE), Exception Detect and Branch Correction (DET), and Write Back (WB). IS1, IS2, and IS3 are concerned with analyzing a packet of intructions for issue, and register renam- ing. Branch targets are calculated in IS1. Branches are resolved in DET. L1 cache access, instruction or data, takes a single cycle. After the REG cycle, floating-point instructions have stages FP1, FP2, FP3, FP4, and FWB, replacing EX, DET, and WB. Consider the instructions Jump (J), Jump Register (JR), Branch, Integer, FP, Load, and Store.

(a) (10 points) What data and control forwarding is needed? Use an IF1 IS

... diagram to show several instructions executing and label and comment on each arrow. Assume single instruction issue. (b) (10 points) Make a symbolic pipeline diagram by showing the IF... RF stages (with latches in between), and a split into two possible pipelines: EX... WB and FP1... FWB. Draw forwarding paths between the stages for data and control forwarding. (c) (10 points) What data stalls remain? Provide a neat list of the form “Instructions X, Y, and Z with result R1 followed by W or X with source R1: 2 intervening clock cycles.” (Or more compactly, {X,Y,Z} followed by {W,X}: 2.) (d) (10 points) Control hazards. Assume 10% of instructions are branches, 2% are jumps and calls, and 2% are returns (jr). The McKinley has a Branch Target Buffer (BTB) and a Branch Prediction Table (BPT). The BPT is 90% accurate. Assume that both correctly and incorrectly predicted branches are 60% taken. The Branch Target Buffer has a hit rate of 70%, regardless of the accuracy of the prediction. What are the costs in clocks for the various branch possibilities (ie, presense or absense in BTB, correct or incorrect BPT, taken or not taken)? What is the average cost in clocks for a control statement? (e) (10 points) McKinley tries to issue 6 instructions per clock cycle (ideal IPC = 6). What is the reduction in IPC considering only control hazards? Next, assume that of the remaining 86% of instructions, only 50% of the peak issue rates can be achieved (roughly based on Figure 4.57) due to structural and data hazard considerations. What is the overall IPC of the machine?

(f) (10 points) If a cache miss happens, an additional 8 cycles are required to ac- cess the on-chip 256K L2 cache. The L1 caches are 16K. If L2 fits all memory necessary for a program, and the instruction cache miss rate is 0.4%, and the data cache miss rate is 6% (estimated from table 5.7 for a 16K cache), what is IPC considering cache misses? Assume that instruction cache misses fully stall the pipeline, but that there are sufficient reservation stations so that other instructions may execute during a data cache miss. (g) (10 points) Consider the merging of DET and WB. What would the implications of this be? What issues would come up in deciding whether or not to merge the two stages? (h) (10 points) Why is instruction issue rate insufficient for comparing different pro- cessors? (i) (Extra credit) Design a VLSI layout for this architecture.

  1. (10 points) Dynamic instruction scheduling

Consider a Tomosulo pipeline (similar to PowerPC 620 in text) with stages IF, ID, IS (the process of moving instructions to reservation stations), EX (1 cycle integer, 2-cycle LSU, 3-cycle FP mult or add), and WB (commit). There is one integer unit with two reservation stations (I1, I2), 1 FP unit with two reservation stations (F1, F2), 1 Load/Store unit with two reservation stations (LS1, LS2), and one Branch unit with 2 reservation stations (B1, B2) that takes 1 cycle. Indicate the clock cycle for each of the following instructions would be in the various stages. Assume 1 instruction issue per cycle and serparate FP and integer result busses. Assume that In the RS# column indicate the reservation station slot (e.g., F1, I2, B1) that was used. For an instruction already in a reservation station, EX1 can commence during the same cycle as the common data bus write in WB. EX2 and EX3 are not used for all instructions — leave blank for those that do not need them. A new instruction can be loaded into a reservation station when the old one is in WB — WB and IS can overlap.