Download 16 Bit ISA, Reg Reg Architecture, Datapath Implementation-Computer Architecture and Assembly Language-Assignment Solution and more Exercises Computer Architecture and Organization in PDF only on Docsity! Instructions to Solve Assignments The purpose of assignments is to give you hands on practice. It is expected that students will solve the assignments themselves. Following rules will apply during the evaluation of assignment. • Cheating from any source will result in zero marks in the assignment. • Any student found cheating in any two of the assignments submitted will be awarded "F" grade in the course. • No assignment after due date will be accepted. docsity.com Question 1: Total Points (30) Design a 16‐bit ISA for processor containing the following components: • 8 General Purpose Registers (GPR) • ALU supporting ADD, SUB, INC, DEC, OR, XOR, AND, NAND, LSHFTR, LSHFTL, ASHFTR, and ASHFTL • Assume ALU contains saturation and overflow check logic so support ADD and SUB with both options, i.e. overflow and saturate • LOAD, STORE with two addressing modes Direct and Indirect • Two control instructions Jump and Call • NOP instruction Design means a complete and detailed table containing bit information for every instruction. All 16‐bit information must be provided to get full credit. Solution: 1. Abstract: In this 16‐bit Instruction Set Architecture Design, 9 bits are used for the addresses of three operands including source1, source2 and destination, while rest of the 7 bits are used as fixed numbers of bits for operation code. 15 0 Opcode Dest Scr1 Scr2 7 3 3 3 Bit Usage: Bits Usage 15th Addressing Mode Specification 14th and 13th Types of Instructions 12th and 11th Sub‐Types of Instructions 10th and 9th Operations 8th till 0th Addresses of three Operands 2. Division of Op‐Code: The 7 bits Op‐code is further divided ion to four parts each defined as follows, 2.1. Mode Indication: As this design support two addressing modes i.e. register‐register, direct and indirect modes, and 15th bit is dedicated to represent the mode, for this purpose ‘0’ indicated the Direct mode while ‘1’ indicates the indirect mode of addressing. docsity.com 3. Instruction Sheets: Combining all the divisions and concepts discussed above, following is the complete instruction sheet for the assigned 16‐bit ISA. Bit Usage: Bits Usage 15th Addressing Mode Specification (Direct / Indirect) 14th and 13th Types of Instructions 12th and 11th Sub‐Types of Instructions 10th and 9th Operations 8th till 0th Addresses of three Operands a) Instruction Sheet (Direct Mode – 15th bit ‘0’): 15th 14th 13th 12th 11th 10th 9th Instruction 0 0 0 0 0 0 0 NOP Control Instructions 0 0 1 0 0 0 0 Jump 0 0 1 1 1 1 1 Call Load/store Instructions 0 1 0 0 0 0 0 Load 0 1 0 1 1 1 1 Store ALU – Arithmetic Instructions 0 1 1 0 0 0 0 ADD Saturate 0 1 1 0 0 0 1 ADD Overflow 0 1 1 0 0 1 0 Subt Saturate 0 1 1 0 0 1 1 Subt Overflow ALU – Shift Instructions 0 1 1 0 1 0 0 LSHFTR 0 1 1 0 1 0 1 LSHFTL 0 1 1 0 1 1 0 ASHFTR 0 1 1 0 1 1 1 ASHFTL ALU – Increment / Decrement Instructions 0 1 1 1 0 0 0 Increment 0 1 1 1 0 1 1 Decrement ALU – Logical Instructions 0 1 1 1 1 0 0 XOR 0 1 1 1 1 0 1 OR 0 1 1 1 1 1 0 AND 0 1 1 1 1 1 1 NAND docsity.com b) Instruction Sheet (In‐Direct Mode – 15th bit ‘1’): 4. Addressing Implementations: Following are few sample implementations with respective addressing modes, OR (Direct Mode) 15th 14th 13th 12th 11th 10th 9th 8th 7th 6th 5th 4th 3rd 2nd 1st 0th 0 1 1 1 1 0 1 Destination Source 0 0 0 ADD Saturate (Direct Mode) 15th 14th 13th 12th 11th 10th 9th 8th 7th 6th 5th 4th 3rd 2nd 1st 0th 0 1 1 0 0 0 0 Destination Source‐1 Source‐2 15th 14th 13th 12th 11th 10th 9th Instruction 1 0 0 0 0 0 0 NOP Control Instructions 1 0 1 0 0 0 0 Jump 1 0 1 1 1 1 1 Call Load/store Instructions 1 1 0 0 0 0 0 Load 1 1 0 1 1 1 1 Store ALU – Arithmetic Instructions 1 1 1 0 0 0 0 ADD Saturate 1 1 1 0 0 0 1 ADD Overflow 1 1 1 0 0 1 0 Subt Saturate 1 1 1 0 0 1 1 Subt Overflow ALU – Shift Instructions 1 1 1 0 1 0 0 LSHFTR 1 1 1 0 1 0 1 LSHFTL 1 1 1 0 1 1 0 ASHFTR 1 1 1 0 1 1 1 ASHFTL ALU – Increment / Decrement Instructions 1 1 1 1 0 0 0 Increment 1 1 1 1 0 1 1 Decrement ALU – Logical Instructions 1 1 1 1 1 0 0 XOR 1 1 1 1 1 0 1 OR 1 1 1 1 1 1 0 AND 1 1 1 1 1 1 1 NAND docsity.com 1 1 0 0 0 0 0 Source Offset 0 0 0 STORE (In‐Direct Mode) 15th 14th 13th 12th 11th 10th 9th 8th 7th 6th 5th 4th 3rd 2nd 1st 0th 1 1 0 1 1 1 1 Destination Offset 0 0 0 JUMP (Direct) 15th 14th 13th 12th 11th 10th 9th 8th 7th 6th 5th 4th 3rd 2nd 1st 0th 0 0 1 0 0 0 0 Base 0 0 0 0 0 0 Question 2: Total Points (10) In reg‐mem architecture, clock cycle is 10 ns wide. It is proposed that reg‐reg architecture be used instead, that reduces the clock cycle by 2 ns. However, it requires an additional load instruction, in some cases! Will the new processor be more efficient, if so under what circumstances? Quantify your answer. Solution: Efficiency of reg-reg will depend upon the resulting increase in instruction count. If it is below a certain threshold level, reg-reg architecture will be more efficient. Assume that we have x instructions originally in reg-mem architecture. Execution time of this architecture is given by (all calculations assume CPI=1) Execution timeold = IC * CPI * CCT = x*1*10 = 10x ns Now for proposed reg-reg architecture, we need additional load/store instructions. Let s assume that increase in instruction count is y%. Then new execution time will be: Execution timenew = IC * CPI * CCT = (x + y% of x)*1*8 Overall speedup is given by Amdhal’s law: Speedupoverall = Execution timeold / Execution timenew If increase in instruction count (y%) is such that Execution timeold = Execution timenew , then value of y will reflect the limit below which new reg-reg architecture will be more efficient than reg-mem architecture. A simple calculation illustrates this point. Assume that increase in the instruction count is 25% i.e. y= 25 %, then both new and old execution times will be equal. If the value of y is less than 25%, as usually in the case, then new architecture will be more efficient. In fact, this is really the case because in typical reg-reg architecture, load/store instruction are about 20 to 22%. docsity.com LW R1, 0(R0) ;load B from memory to R1 MULTI R2, R1, 7 ;multiply immediate with 7 ADD R3, R2, R1 LW R4, 4(R0) ;load C from memory to R4 ADD R5, R3, R4 LW R6, 8(R0) ;load D from memory to R6 ADD R7, R5, R6 SW R7, 12(R0) ;store R7 to memory Y = X + V LW R1, 0(R0) ;load X from memory to R1 LW R2, 4(R0) ;load Y from memory to R2 ADD R3, R1, R2 SW R3, 8(R0) ;store R3 to memory (b) For Reg‐Mem architecture U = A + B + D LW R1, 0(R0) ;load A from memory to R1 ADD R3, R1, 4(R2) ADD R5, R3, 8(R0) SW R5, 12(R0) ;store R5 to memory V = C + D LW R1, 0(R0) ;load C from memory to R1 ADD R3, R1, 4(R0) SW R3, 8(R0) ;store R3 to memory W= B << 3 LW R1, 0(R0) ;load B from memory to R1 SLL R1, R1, 3 ;shift logical left 3 bits SW R1, 4(R0) ;store R1 to memory X = 7B + B + C + D LW R1, 0(R0) ;load B from memory to R1 MULTI R2, R1, 7 ;multiply immediate with 7 ADD R3, R2, R1 ADD R5, R3, 4(R0) ADD R7, R5, 8(R0) SW R7, 12(R0) ;store R7 to memory docsity.com LW R1, 0(R0) ;load X from memory to R1 ADD R3, R1, 4(R0) SW R3, 8(R0) ;store R3 to memory Question 5: Total Points (20) Identify data hazards from the below code and show the execution of the code on a pipelined architecture on per cycle basis. You are required to highlight data hazard(s) and technique used to avoid it. Opcode Target Source 1 Source 2 ADD R1 R2 R3 SUB R4 R1 R5 AND R6 R1 R7 OR R8 R1 R9 XOR R10 R1 R11 Solution: All the instructions after the ADD use the result of the ADD instruction. As shown in Figure 1.1, the ADD instruction writes the value of R1 in the WB pipe stage, but the SUB instruction reads the value during its ID stage, which is called data hazard. Unless precautions are taken to prevent it, the SUB instruction will read the wrong value and try to use it. In fact, the value used by the SUB instruction is not even deterministic: Though it is logical to assume that SUB would always use the value of R1 that was assigned by an instruction prior to ADD, this is not always the case. If an interrupt should occur between the ADD and SUB instructions, the WB stage of the ADD will complete, and the value of R1 at that point will be the result of the ADD. This unpredictable behavior is obviously unacceptable. The AND instruction is also affected by this hazard. As shown in Figure 1.1, the write of R1 does not complete until the end of clock cycle 5. Thus, the AND instruction that reads the registers during clock cycle 4 will receive the wrong results. The XOR instruction operates properly because its register read occurs in clock cycle 6, after the register write. The OR instruction also operates without incurring a hazard because we perform the register file reads in the second half of the cycle and the writes in the first half. docsity.com Figure 1.1 The use of the result of the DADD instruction in the next three instructions causes a hazard, since the register is not written until after those instructions read it. We can solve this problem by using a simple hardware technique called forwarding (also called bypassing and sometimes short‐circuiting) as shown in Figure 1.2 below. The key insight in forwarding is that the result is not really needed by the SUB until after the ADD actually produces it. If the result can be moved from the pipeline register where the ADD stores it to where the SUB needs it, then the need for a stall can be avoided. With forwarding, if the SUB is stalled, the ADD will be completed and the bypass will not be activated. This relationship is also true for the case of an interrupt between the two instructions. We need to forward results not only from the immediately previous instruction, but possibly from an instruction that started 2 cycles earlier. Figure 1.2 shows the code sequence with the bypass paths in place and highlighting the timing of the register read and writes. This code sequence can be executed without stalls. docsity.com