








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Main points of this exam paper are: Sentence Justification, Virtuosity, Justified, Altered States, Three’S Company, Sentence Justification, Dynamic Multiple, Emitted Code, Circuit Composed, Combinational Logic
Typology: Exams
1 / 14
This page cannot be seen from the preview
Don't miss anything!









Your Name: ______________________________________________________________ Your TA: Andrew Michael Conor Charles Login: cs61c-___ This exam is worth 18 6 points, or about 20% of your total course grade. The exam contains 1 0 questions. This booklet contains 14 numbered pages including the cover page, plus the 2 pages for the green card. Put all answers on these pages, please; don't hand in stray pieces of paper. Question Points (Minutes) Score
3. Mo’ Cache, Mo’ Problems a) A naive hardware developer designs the cache protocol for a 2 CPU system, each CPU with a 2 block direct mapped cache, 1 byte blocks. Memory addresses are 32 bits (byte addressed). The cache policies are Allocate on write, and Write through. Assume that both caches start out with all blocks invalid. Which of the following access patterns, executed independently from one another , will always yield correctly updated results? Assume that each Read/Write finishes completely during its time period. CPU Time 1 Time 2 Time 3 Correct? i. 1 Write 0x0 --- Read 0x 2 --- Write 0x0 --- ii. 1 Write 0x0 --- Read 0x 2 --- Write 0x1 --- iii. 1 Write 0x0 --- Read 0x 2 --- Write 0x2 --- iv. 1 Read 0x0 --- Read 0x 2 --- Write 0x2 --- v. 1 Read 0x0 --- Read 0x 2 --- Write 0x0 --- b) What feature could you add to this cache protocol to make it work in all cases? Be specific; your revision should describe a particular behavior.
(vii) What is the AMAT for a one-level cache without L2$ and L3 $? (viii) What is the average memory stalls per reference in the system of question (v)? (ix) What is the average memory stalls per reference in the system of question (vi)? (x) What is the average memory stalls per reference in the system of question (vii)? (xi) What is the average memory stalls per instruction in the system of question (v)? (xii) What is the average memory stalls per instruction in the system of question (vi)? (xiii) What is the average memory stalls per instruction in the system of question (vii)? (xiv) What is the performance of the system of question (vii) versus that of question (v)? (Long-division challenged can give just the ratio.) (xv) What is the performance of the system of question (vi) versus that of question (v)? (Long-division challenged can give just the ratio.)
5. Heaven’s Gate What is the truth table for the following circuit? (It might be simpler to first write the equation) A B C Out 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1
7. Pay It Forward Consider the excerpt below of a 5-stage pipelined MIPS datapath. a. Consider the following sequence of instructions [A] srl $zero, $zero, 0 [B] addu $t0, $t1, $t [C] addu $t0, $t0, $t [D] lw $s0, 0($t3) [E] subu $t3, $s0, $t During which of these instructions’ decode stages in the sequence above should ControlRS be 1 to avoid pipeline stalls? Use the labels [A], [B], … b. Which fields of which instructions from part a does the control logic need to compute the value of ControlRS?
8. Bigger, Stronger, Faster: Suppose that you are running an algorithm for various problem sizes, and have obtained the data below. Sketch a weak scaling plot of parallel code performance that shows speedup over the serial implementation. Be sure to label the Y-axis. Problem Size Gflop/s (serial) Threads Gflop/s (parallel) 100 5 1 5 200 5 2 10 400 5 4 19 600 5 6 25 800 5 8 35 1000 5 10 36 1200 5 12 37 1400 5 14 37 1600 5 16 38 Weak Scaling of Speedup over Serial 1 2 4 6 8 10 12 14 16 18 20 22 Threads Linear Speedup
10. Three’s Company Consider the following datapath with an Arithmetic Logic Unit (ALU) and an eight-register register file organized around a single bus. The ALU is to apply add, subtract, and so on operations to its two input operands to generate an output result. The register file has an asynchronous read and a synchronous write. That is, as soon as the Read Enable (RE) is asserted, the register file selects the indicated 32-bit register and presents its value on the Data Out (DO). On the other hand, the Write Enable (WE) is sampled only on the rising edge of the clock, and only writes the indicated register from the Data In (DI) lines on the same edge that WE is asserted. The ALU and Register File share the Bus via a 32-bit wide 2:1 multiplexer. When SelALU is set to 1, the ALU path is connected to the Bus. Otherwise, the Register File path is connected to the Bus. The datapath must support three-address instructions of the form Rz Rx
Using the fewest of the A/B/C registers and possible clock cycles, what is the fewest number of each to implement the register transfer for the instructions of the three-address type (circle one for each): Registers 1 2 3 Clock Cycles 1 2 3 For your answer, on the previous page cross out the registers you don’t need, and fill-in the outline of the registers that you do. For each clock cycle that you need according to your answer above, write in the space below the control signals that must be asserted to implement the register transfers for the three-address instructions: Clock Cycle 1: Clock Cycle 2: Clock Cycle 3: