







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Exam Final Practice Questions and their solutions
Typology: Exercises
1 / 13
This page cannot be seen from the preview
Don't miss anything!








Single‐Cycle Datapath/ Multi‐Cycle Datapath Adding instructions
Modify the datapath and control signals to perform the new instructions in the corresponding datapath.
Use the minimal amount of additional hardware and clock cycles/control states.
Remember:
When adding new instructions, don't break the operation of the standard ones. Avoid adding ALUs, adders, Reg Files, or memories to the datapath You can add MUXes, logic gates, etc. but try to do minimally. (these cost in terms of area, cycle time, etc)
a. Load Word Register (uses R instruction format) lwr Rt, Rd (Rs) #Reg[Rt] = Mem[Reg[Rd]+Reg[Rs]]
b. Add 3 operands (new instruction format: opcode(6), rs(5), rt(5), rd(5), rx(5), (6 bits not used)) add3 Rd, Rs, Rt, Rx #Reg[Rd] = Reg[Rs] + Reg[Rt] + Reg[Rx] c. Add to Memory (new instruction format: opcode(6), rs(5), rt(5), rd(5), offset(11))
addm Rd, Rt, Offset(Rs) #Reg[Rd] = Reg[Rt] + Mem[sign extended offset + Reg[Rs]] d. Branch on less than or Equal (uses I instruction format)
blez Rs, label # if Reg[Rs] < 0, PC = PC+4 + (sign‐extended offset << 2) e. Branch Equal to Memory (new instruction format: opcode(6), rs(5), rt(5), rd(5), offset(11))
beqm Rd, Rt, Offset(Rs) # if Reg[Rt] = Mem[Offset+Reg[Rs]], PC = PC + 4 + Reg[Rd] f. Branch Equal to 0 to Immediate (uses R instruction format)
beqzi (Rs), Label #if Mem[Reg[Rs]] = 0, then PC = PC + (sign‐extended offset) (NOTE: This is not PC+4, and not shifted by 2)
g. Store Word and Increment swinc Rt, offset(Rs) #Mem[Reg[Rs] + sign extended offset]= Reg[Rt], Reg[Rs] = Reg[Rs] + 4
h. Store Word and Decrement swdec Rt, offset(Rs) #Mem[Reg[Rs] + sign extended offset] = Reg[Rt], Reg[Rs] = Reg[Rs] – 4
What if you were to add (g) and (h) simultaneously to the datapaths?
Datapath Timing
Memory: 200ps Register Files Access (READ/Write): 50ps ALU and adders: 100ps Logic Gates and Multiplexors: 1ps All other times are negligible
Other Datapath Questions
Given MIPS code, can you determine…..
Whatis happening at clock cycle X in the Single Cycle Datapath? Or what cycle is operation X happening? What is happening at clock cycle X in the Multi Cycle Datapath? Or what cycle is operation X happening? How many cycles it will take to execute the code? Can you identify the signals (control and values) in the datapath for a given clock cycle? And other questions of this nature….
Short Answer Misc Questions
Throughput = 1/latency when a system is executing one task at a time eg. In a single or multi‐ cycle datapath
Average CPI (old) = 0.40 * 1 + 0.30 * 5 + 0.30 * 2 = 2. Average CPI (enhanced) = 0.40 * 1 + 0.30 * 3 + 0.30 * 2 = 1. Speedup (computation) = 2.5 / 1.9 = 1. Speedup = 1 / ((1 - 0.82) + 0.82 / 1.315) = 1.
b. Which modification gave the best speedup?
Modification (i) provides the best speedup.
c. For the two modifications in part (i) that did not result in the best speedup, is it possible for them to achieve the speedup achieved by the modification in part (ii)? Show your work and explain your answer. i. An infineitely fast dask: Speedup = 1 / (1 ‐0.18) = 1.22 which is still slower than i ii. If FP only 1 clock cycle: Average CPI (enhanced) = 0.40 * 1 + 0.30 * 1 + 0.30 * 2 = 1. Speedup (computation) = 2.5 / 1.3 = 1. Speedup = 1 / ((1 - 0.82) + 0.82 / 1.92) = 1.
Assume the following distribution of instruction types (assume that LUI requires 3 cycles):
Processor X Processor Z MULT 0% 5% LUI 5% 0% LW 20% 25% SW 10% 10% R‐Type 45% 40% BEQ 20% 20%
For example, if processor Z executes 5 MULT instructions out of every 100. For each MULT instruction, processor X executes an additional 30 instructions.
a. Compare the execution times of the two processors.
Execution time = TIC = Cycle time * Instruction Count * Average CPI Exec time(x) = Tx * Ix * Cx Exec time(z) = Tz * Iz * Cz Tx= 10ns; Tz=12ns
If Iz=100, Ix= 95 + (5 * 30) = 245 CPI for each instruction type: MULT=80 cycles, LUI =3, LW=5, SW=4, R‐TYPE =4, BEQ=
Therefore: Average CPI for X = Cx = (0.053)+(0.25)+(0.14)+(0.454)+(0.23)=3. Average CPI for Z = Cz = (0.0580)+(0.255)+(0.14)+(0.44)+(0.23)=7.
Comparing execution times: Exec time(x) = 10 ns/c * 3.95 c/i * 245 i = 9677.5 ns Exec time(z) = 12 ns/c * 7.85 c/i * 100 i = 9420 ns
Processor Z with the multiply instruction is about 1.03 times faster than processor X for this instruction mix.
b. At what clock speed for processor Z are the two designs equal in performance?
Equating the two excution times and solving for Tz 10ns * 3.95cpi * 245 instructions = Tz * 7.85cpi * 100 instructions Tz = (10ns * 3.95 * 245)/(7.85* 100) Tz= 12.33ns
For smaller Tz (faster clock), processor Z has better performance; for larger Tz (slower clock), processor X has better performance.
c. (more difficult) Assuming the original ALU latency for processor Z (12 ns), how fast would your softwareemulated multiply have to be (on average) for processor X to be just as fast as processor Z? In other words, how many instructions would processor X execute in place of 1 MUL?
Remember that Ix was defined to be 95 + (# of multiplications) * (cost of each) We first need to find the instruction count of processor X necessary for equal performance. 10ns * 3.95 * Ix = 12ns * 7.85 * 100 which implies Ix = 238. Total number of multiply‐emulate instructions is (238.5‐95)= 143. Therefore number of instructions per multiply = 143.5/5 = ~28 instructions
M1 can never be faster than M2, even if all the instructions are branch instructions, the CPI will be 3 for all 3 cases, and the clock rate is faster on the other 2 processors.
M3 can be faster than M2, if all instruction loads or all stores then
Ex: M2: Average CPI = 14 + 04 + 03 + 03 = 4 M3: Average CPI = 13 + 03 + 03 + 03 = 3
M2 Cycle Time = (CPI#instructions)/ clock rate = 4I/3.2GHz = 1.25I10‐^9 seconds M3 Cycle Time = (CPI#instructions)/ clock rate = 3I/3.2GHz = 1.07I10‐^9 seconds
Review Adder and ALU Creation and Building larger ALUs from units
Consider the 4 ‐bit ALU below which can perform the following 5 operations: add, sub, AND, OR and
negate B.
Inputs are A={A 3 ,A 2 ,A 1 ,A 0 }, B={B 3 ,B 2 ,B 1 ,B 0 }, and C (^) in. Outputs are result R={R 3 ,R 2 ,R 1 ,R 0 }and C (^) out. Numbers
are in 2’s complement form. Fill in the table below, for each operation, what the values of the control signals should be. Indicate don’t cares where appropriate.
Operation m 1 M 0 Cin B (^) INV Az Add^0 0 0 0 Sub 0 0 1 1 1 OR 1 0 X 0 1 AND^0 1 X^0 Negate B 0 0 0 0 1
Digital Logic
a. bd’ + c’d = ((b’d’) + (cd))’ bd’ + c’d = (b’d’)’(cd)’ De Morgans bd’ + c’d = (b+d)(c’+d’) De Morgans bd’ + c’d = bc’+ c’d+ bd’ + dd’ Distributive bd’ + c’d = bc’+ c’d+ bd’ + 0 Complementary bd’ + c’d = bc’(d+d’)+ c’d+ bd’ Null bd’ + c’d = bc’d + bc’d’+ c’d+ bd’ Commutative bd’ + c’d = c’d (1 + b) + bd’(c’ + 1) Distributive bd’ + c’d = bd’ + c’d Null
b. abc’ + bc’d + a’bd = abc’ + a’bd
abc’ + bc’d(a+a’) + a’bd = abc’ + a’bd Null abc’ + abc’d + a’bc’d + a’bd = abc’ + a’bd Distributive abc’(1+d) + a’bd (c’+1) = abc’ + a’bd Commutative abc’ + a’bd = abc’ + a’bd Null
c. a’ + a(a’b + b’c)’ = a’ + b + c’ a’ + a((a’b)’ (b’c)’) = a’ + b + c’ De Morgans a’ + a((a+b’)(b+c’)) = a’ + b + c’ De Morgans a’ + a(ab+bb’+ac’+bc’) = a’ + b + c’ Distributive a’ + a(ab+0+ac’+bc’) = a’ + b + c’ Complementary a’ + a(ab+ac’+bc’) = a’ + b + c’ Distributive a’ + aab+aac’+abc’ = a’ + b + c’ Idempotence a’ + ab+ac’+abc’ = a’ + b + c’ Idempotence a’ + a(b+c’+bc’) = a’ + b + c’ a’ + b + c’+ bc’ = a’ + b + c’ No Name a’ + b + c’(1+ b) = a’ + b + c’ Null a’ + b + c’ = a’ + b + c’
a. How many literals does z contain? 11 b. Is z, minimal? If not, find the minimal expression using Boolean algebra. No.
c. Find the equivalent sum of minterms (SOP) for z (using m notation)
The straight forward solution using minterms/SOP expression:
A Better Solution using Demorgan’s law: A’ + B + C = ((A’ + B + C)’)’ = (AB’C’)’
a 1 a 0 b 1 b 0 A+B (decimal)
(decimal)
Z 3 = a1a0b1b Z 2 = a1a0’b1 + a1a0b1b0’ = a1a0’b1 + a1b1b0’ Z 1 = a1’b1b0 + a1’a0b0 + a1’a0b1 +a1a0b1’ + a1b1’b
Z 0 = (a0 + b0)(a1+a0’+b
Decoders take a binary number and map this value to an output line. 2 ‐input values, means 4 different values (4 outputs)