Exam Final Practice Questions, Exercises of Computer Science

Exam Final Practice Questions and their solutions

Typology: Exercises

2019/2020

Uploaded on 01/06/2022

salman-siddique-2
salman-siddique-2 🇵🇰

2 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSE320FinalExamPracticeQuestions
SingleCycleDatapath/MultiCycleDatapathAddinginstructions
Modifythedatapathandcontrolsignalstoperformthenewinstructionsinthecorrespondingdatapath.
Usetheminimalamountofadditionalhardwareandclockcycles/controlstates.
Remember:
Whenaddingnewinstructions,don'tbreaktheoperationofthestandardones.
AvoidaddingALUs,adders,RegFiles,ormemoriestothedatapath
YoucanaddMUXes,logicgates,etc.buttrytodominimally.(thesecostintermsofarea,cycle
time,etc)
a. LoadWordRegister(usesRinstructionformat)
lwrRt,Rd(Rs)#Reg[Rt]=Mem[Reg[Rd]+Reg[Rs]]
b. Add3operands(newinstructionformat:opcode(6),rs(5),rt(5),rd(5),rx(5),(6bitsnotused))
add3Rd,Rs,Rt,Rx#Reg[Rd]=Reg[Rs]+Reg[Rt]+Reg[Rx]
c. AddtoMemory(newinstructionformat:opcode(6),rs(5),rt(5),rd(5),offset(11))
addmRd,Rt,Offset(Rs)#Reg[Rd]=Reg[Rt]+Mem[signextendedoffset+Reg[Rs]]
d. BranchonlessthanorEqual(usesIinstructionformat)
blezRs,label#ifReg[Rs]<0,PC=PC+4+(signextendedoffset<<2)
e. BranchEqualtoMemory(newinstructionformat:opcode(6),rs(5),rt(5),rd(5),offset(11))
beqmRd,Rt,Offset(Rs)#ifReg[Rt]=Mem[Offset+Reg[Rs]],PC=PC+4+Reg[Rd]
f. BranchEqualto0toImmediate(usesRinstructionformat)
beqzi(Rs),Label#ifMem[Reg[Rs]]=0,thenPC=PC+(signextendedoffset)
(NOTE:ThisisnotPC+4,andnotshiftedby2)
g. StoreWordandIncrement
swincRt,offset(Rs)#Mem[Reg[Rs]+signextendedoffset]=Reg[Rt],Reg[Rs]=Reg[Rs]+4
h. StoreWordandDecrement
swdecRt,offset(Rs)#Mem[Reg[Rs]+signextendedoffset]=Reg[Rt],Reg[Rs]=Reg[Rs]4
Whatifyouweretoadd(g)and(h)simultaneouslytothedatapaths?
DatapathTiming
1. Calculatethedelayinthemodifieddatapathswhenperforminginstructionsabove.Assumethe
followingdelays:
Memory:200ps
RegisterFilesAccess(READ/Write):50ps
ALUandadders:100ps
LogicGatesandMultiplexors:1ps
Allothertimesarenegligible
2. CalculatetheminimalclockcycletimeifallofthenewinstructionswereaddedintheSingleand
Multicyclecases.
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Exam Final Practice Questions and more Exercises Computer Science in PDF only on Docsity!

CSE320 Final Exam Practice Questions

Single‐Cycle Datapath/ Multi‐Cycle Datapath Adding instructions

Modify the datapath and control signals to perform the new instructions in the corresponding datapath.

Use the minimal amount of additional hardware and clock cycles/control states.

Remember:

 When adding new instructions, don't break the operation of the standard ones.  Avoid adding ALUs, adders, Reg Files, or memories to the datapath  You can add MUXes, logic gates, etc. but try to do minimally. (these cost in terms of area, cycle time, etc)

a. Load Word Register (uses R instruction format) lwr Rt, Rd (Rs) #Reg[Rt] = Mem[Reg[Rd]+Reg[Rs]]

b. Add 3 operands (new instruction format: opcode(6), rs(5), rt(5), rd(5), rx(5), (6 bits not used)) add3 Rd, Rs, Rt, Rx #Reg[Rd] = Reg[Rs] + Reg[Rt] + Reg[Rx] c. Add to Memory (new instruction format: opcode(6), rs(5), rt(5), rd(5), offset(11))

addm Rd, Rt, Offset(Rs) #Reg[Rd] = Reg[Rt] + Mem[sign extended offset + Reg[Rs]] d. Branch on less than or Equal (uses I instruction format)

blez Rs, label # if Reg[Rs] < 0, PC = PC+4 + (sign‐extended offset << 2) e. Branch Equal to Memory (new instruction format: opcode(6), rs(5), rt(5), rd(5), offset(11))

beqm Rd, Rt, Offset(Rs) # if Reg[Rt] = Mem[Offset+Reg[Rs]], PC = PC + 4 + Reg[Rd] f. Branch Equal to 0 to Immediate (uses R instruction format)

beqzi (Rs), Label #if Mem[Reg[Rs]] = 0, then PC = PC + (sign‐extended offset) (NOTE: This is not PC+4, and not shifted by 2)

g. Store Word and Increment swinc Rt, offset(Rs) #Mem[Reg[Rs] + sign extended offset]= Reg[Rt], Reg[Rs] = Reg[Rs] + 4

h. Store Word and Decrement swdec Rt, offset(Rs) #Mem[Reg[Rs] + sign extended offset] = Reg[Rt], Reg[Rs] = Reg[Rs] – 4

What if you were to add (g) and (h) simultaneously to the datapaths?

Datapath Timing

  1. Calculate the delay in the modified datapaths when performing instructions above. Assume the following delays:

 Memory: 200ps  Register Files Access (READ/Write): 50ps  ALU and adders: 100ps  Logic Gates and Multiplexors: 1ps  All other times are negligible

  1. Calculate the minimal clock cycle time if all of the new instructions were added in the Single and Multicycle cases.

Other Datapath Questions

Given MIPS code, can you determine…..

 Whatis happening at clock cycle X in the Single Cycle Datapath? Or what cycle is operation X happening?  What is happening at clock cycle X in the Multi Cycle Datapath? Or what cycle is operation X happening?  How many cycles it will take to execute the code?  Can you identify the signals (control and values) in the datapath for a given clock cycle?  And other questions of this nature….

Short Answer Misc Questions

  1. What is the primary advantage of fixed‐sized opcodes?

Instruction decode is faster and more efficient. Control does not need to determine the

length/ position of the opcode in the instruction.

  1. Will a speedup of 20 on 50% of a program result in an overall speedup of at least 2 times? Explain your answer The new overall speedup is calculated according to Amdahl’s Law. For an overall speedup of 2, the new execution time must be 50% or less of the old execution time. No. The new overall execution time = 50% + 50%/20 = 52.5% of the old.
  2. What are the 5 components of a modern computer system (Hint: Two of them can be combined and called the processor) Datapath + control = processor, memory, input, output
  3. What is a stored program computer? A computer where the instruction of the program are stored in memory, the CPU is assigned the task of fetching the instruction from memory, decoding them and executing them.
  4. True or False:  Program execution time increase when the instruction count increase (IC) TRUE  In a load/store architecture, the only instructions that access memory are load and store types. TRUE  More powerful instructions lead to higher performance since the total number of instructions executed is smaller for a given task with more powerful instructions. FALSE  An add operation has 3 operands (2 input and 1 output), therefore add instructions must be 3 ‐address instructions. FALSE
  5. In a system executing jobs, when is throughput = 1/latency? Throughput of a machine is the number of instructions which are executed per second. Latency is the length of time per execution of an instruction.

Throughput = 1/latency when a system is executing one task at a time eg. In a single or multi‐ cycle datapath

Average CPI (old) = 0.40 * 1 + 0.30 * 5 + 0.30 * 2 = 2. Average CPI (enhanced) = 0.40 * 1 + 0.30 * 3 + 0.30 * 2 = 1. Speedup (computation) = 2.5 / 1.9 = 1. Speedup = 1 / ((1 - 0.82) + 0.82 / 1.315) = 1.

b. Which modification gave the best speedup?

Modification (i) provides the best speedup.

c. For the two modifications in part (i) that did not result in the best speedup, is it possible for them to achieve the speedup achieved by the modification in part (ii)? Show your work and explain your answer. i. An infineitely fast dask: Speedup = 1 / (1 ‐0.18) = 1.22 which is still slower than i ii. If FP only 1 clock cycle: Average CPI (enhanced) = 0.40 * 1 + 0.30 * 1 + 0.30 * 2 = 1. Speedup (computation) = 2.5 / 1.3 = 1. Speedup = 1 / ((1 - 0.82) + 0.82 / 1.92) = 1.

  1. You have two RiSC‐ 16 processors X and Z, with the following characteristics. They are both multi‐cycle processors, in which an instruction executes in a variable number of processor cycles. X and Z execute variations on the same instruction set (RiSC) is the following way: a. Processor X implements the base instruction set, including LUI. Processor X implements multiplication in software, meaning there is not MULT instruction. b. Processor Z eliminates the LUI instruction in favor of a MULT instruction, getting LUI functionality from LW. c. Processor Z’s MULT instruction uses the ALU over & over again in a loop, performing shifts and conditional adds, and requires 80 processor cycles per multiply. d. Executing one MULT instruction on Processor Z eliminates on average 30 instructions that would be executed on Processor X when implemented in a software. However, Processor Z then need additional ALU functionality which increasing the ALU’s critical path from 10ns to 12ns. Also, Assume the following: - Cache read/write: 10 ns - Register file read/write: 8 ns - ALU operation: 10 ns for processor X, 12 ns for processor Z

Assume the following distribution of instruction types (assume that LUI requires 3 cycles):

Processor X Processor Z MULT 0% 5% LUI 5% 0% LW 20% 25% SW 10% 10% R‐Type 45% 40% BEQ 20% 20%

For example, if processor Z executes 5 MULT instructions out of every 100. For each MULT instruction, processor X executes an additional 30 instructions.

a. Compare the execution times of the two processors.

Execution time = TIC = Cycle time * Instruction Count * Average CPI Exec time(x) = Tx * Ix * Cx Exec time(z) = Tz * Iz * Cz Tx= 10ns; Tz=12ns

If Iz=100, Ix= 95 + (5 * 30) = 245 CPI for each instruction type: MULT=80 cycles, LUI =3, LW=5, SW=4, R‐TYPE =4, BEQ=

Therefore: Average CPI for X = Cx = (0.053)+(0.25)+(0.14)+(0.454)+(0.23)=3. Average CPI for Z = Cz = (0.0580)+(0.255)+(0.14)+(0.44)+(0.23)=7.

Comparing execution times: Exec time(x) = 10 ns/c * 3.95 c/i * 245 i = 9677.5 ns Exec time(z) = 12 ns/c * 7.85 c/i * 100 i = 9420 ns

Processor Z with the multiply instruction is about 1.03 times faster than processor X for this instruction mix.

b. At what clock speed for processor Z are the two designs equal in performance?

Equating the two excution times and solving for Tz 10ns * 3.95cpi * 245 instructions = Tz * 7.85cpi * 100 instructions Tz = (10ns * 3.95 * 245)/(7.85* 100) Tz= 12.33ns

For smaller Tz (faster clock), processor Z has better performance; for larger Tz (slower clock), processor X has better performance.

c. (more difficult) Assuming the original ALU latency for processor Z (12 ns), how fast would your softwareemulated multiply have to be (on average) for processor X to be just as fast as processor Z? In other words, how many instructions would processor X execute in place of 1 MUL?

Remember that Ix was defined to be 95 + (# of multiplications) * (cost of each) We first need to find the instruction count of processor X necessary for equal performance. 10ns * 3.95 * Ix = 12ns * 7.85 * 100 which implies Ix = 238. Total number of multiply‐emulate instructions is (238.5‐95)= 143. Therefore number of instructions per multiply = 143.5/5 = ~28 instructions

M1 can never be faster than M2, even if all the instructions are branch instructions, the CPI will be 3 for all 3 cases, and the clock rate is faster on the other 2 processors.

M3 can be faster than M2, if all instruction loads or all stores then

Ex: M2: Average CPI = 14 + 04 + 03 + 03 = 4 M3: Average CPI = 13 + 03 + 03 + 03 = 3

M2 Cycle Time = (CPI#instructions)/ clock rate = 4I/3.2GHz = 1.25I10‐^9 seconds M3 Cycle Time = (CPI#instructions)/ clock rate = 3I/3.2GHz = 1.07I10‐^9 seconds

Review Adder and ALU Creation and Building larger ALUs from units

Consider the 4 ‐bit ALU below which can perform the following 5 operations: add, sub, AND, OR and

negate B.

Inputs are A={A 3 ,A 2 ,A 1 ,A 0 }, B={B 3 ,B 2 ,B 1 ,B 0 }, and C (^) in. Outputs are result R={R 3 ,R 2 ,R 1 ,R 0 }and C (^) out. Numbers

are in 2’s complement form. Fill in the table below, for each operation, what the values of the control signals should be. Indicate don’t cares where appropriate.

Operation m 1 M 0 Cin B (^) INV Az Add^0 0 0 0 Sub 0 0 1 1 1 OR 1 0 X 0 1 AND^0 1 X^0 Negate B 0 0 0 0 1

Digital Logic

  1. Using Boolean algebra, prove the following:

a. bd’ + c’d = ((b’d’) + (cd))’ bd’ + c’d = (b’d’)’(cd)’ De Morgans bd’ + c’d = (b+d)(c’+d’) De Morgans bd’ + c’d = bc’+ c’d+ bd’ + dd’ Distributive bd’ + c’d = bc’+ c’d+ bd’ + 0 Complementary bd’ + c’d = bc’(d+d’)+ c’d+ bd’ Null bd’ + c’d = bc’d + bc’d’+ c’d+ bd’ Commutative bd’ + c’d = c’d (1 + b) + bd’(c’ + 1) Distributive bd’ + c’d = bd’ + c’d Null

b. abc’ + bc’d + a’bd = abc’ + a’bd

abc’ + bc’d(a+a’) + a’bd = abc’ + a’bd Null abc’ + abc’d + a’bc’d + a’bd = abc’ + a’bd Distributive abc’(1+d) + a’bd (c’+1) = abc’ + a’bd Commutative abc’ + a’bd = abc’ + a’bd Null

c. a’ + a(a’b + b’c)’ = a’ + b + c’ a’ + a((a’b)’ (b’c)’) = a’ + b + c’ De Morgans a’ + a((a+b’)(b+c’)) = a’ + b + c’ De Morgans a’ + a(ab+bb’+ac’+bc’) = a’ + b + c’ Distributive a’ + a(ab+0+ac’+bc’) = a’ + b + c’ Complementary a’ + a(ab+ac’+bc’) = a’ + b + c’ Distributive a’ + aab+aac’+abc’ = a’ + b + c’ Idempotence a’ + ab+ac’+abc’ = a’ + b + c’ Idempotence a’ + a(b+c’+bc’) = a’ + b + c’ a’ + b + c’+ bc’ = a’ + b + c’ No Name a’ + b + c’(1+ b) = a’ + b + c’ Null a’ + b + c’ = a’ + b + c’

  1. Consider the following function: z(x 3 ,x2, x1,x 0 ) = x 3 ’x 2 + x 3 x 1 x 0 + x 3 x 2 x 0 ’+ x 3 ’x 2 x 0

a. How many literals does z contain? 11 b. Is z, minimal? If not, find the minimal expression using Boolean algebra. No.

x 3 ’x 2 + x 3 x 1 x 0 + x 3 x 2 x 0 ’+ x 3 ’x 2 x 0

x 3 ’x 2 (1 + x 0 ) + x 3 x 1 x 0 + x 3 x 2 x 0 ’

x 3 ’x 2 (1 + x 0 ’) + x 3 x 1 x 0 + x 3 x 2 x 0 ’

x 3 ’x 2 + x 3 ’x 2 x 0 ’ + x 3 x 1 x 0 + x 3 x 2 x 0 ’

x 3 ’x 2 + x 3 x 1 x 0 + x 2 x 0 ’

c. Find the equivalent sum of minterms (SOP) for z (using m notation)

z(x 3 ,x 2, x 1, x 0 ) =  m(4,5,6,7,11,12,14,15)

The straight forward solution using minterms/SOP expression:

A Better Solution using Demorgan’s law: A’ + B + C = ((A’ + B + C)’)’ = (AB’C’)’

  1. Find the minimal 2 ‐level implementation using NOR‐NOR gates, of a system with two 2 ‐bit inputs (A = {a 1 , a 0 } & B = {b 1 , b 0 }) which output the following. If A+B is even, then the output is their product. If A+B is odd, then the output is their sum.

a 1 a 0 b 1 b 0 A+B (decimal)

Z

(decimal)

Z 3 Z 2 Z 1 Z 0

Z 3 = a1a0b1b Z 2 = a1a0’b1 + a1a0b1b0’ = a1a0’b1 + a1b1b0’ Z 1 = a1’b1b0 + a1’a0b0 + a1’a0b1 +a1a0b1’ + a1b1’b

Z 0 = (a0 + b0)(a1+a0’+b

  1. Implement the functionality of a 2 ‐input Decoder using minimal AND, OR and NOT gates.

Decoders take a binary number and map this value to an output line. 2 ‐input values, means 4 different values (4 outputs)

S 1 S 0 F 3 F 2 F 1 F 0

  1. Implement the 7 ‐segment using a 4 ‐selector DEMUX and Or gates.