MIPS documents for Computer architecture, Exercises of Computer Science

Includes solution to textbook by Patterson

Typology: Exercises

2018/2019

Uploaded on 11/29/2019

l-srihari
l-srihari 🇮🇳

4.5

(2)

4 documents

1 / 80

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Solutions
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50

Partial preview of the text

Download MIPS documents for Computer architecture and more Exercises Computer Science in PDF only on Docsity!

Solutions

Chapter 1 Solutions S-

1.1 Personal computer (includes workstation and laptop): Personal computers emphasize delivery of good performance to single users at low cost and usually execute third-party software. Personal mobile device (PMD, includes tablets): PMDs are battery operated with wireless connectivity to the Internet and typically cost hundreds of dollars, and, like PCs, users can download software (“apps”) to run on them. Unlike PCs, they no longer have a keyboard and mouse, and are more likely to rely on a touch-sensitive screen or even speech input. Server: Computer used to run large problems and usually accessed via a network. Warehouse scale computer: Thousands of processors forming a large cluster. Supercomputer: Computer composed of hundreds to thousands of processors and terabytes of memory. Embedded computer: Computer designed to run one application or one set of related applications and integrated into a single system.

a. Performance via Pipelining b. Dependability via Redundancy c. Performance via Prediction d. Make the Common Case Fast e. Hierarchy of Memories f. Performance via Parallelism g. Design for Moore’s Law h. Use Abstraction to Simplify Design

1.3 The program is compiled into an assembly language program, which is then assembled into a machine language program.

a. 1280  1024 pixels  1,310,720 pixels  1,310,720  3  3,932, bytes/frame. b. 3,932,160 bytes  (8 bits/byte) /100E6 bits/second  0.31 seconds

a. performance of P1 (instructions/sec)  3  109 /1.5  2  109 performance of P2 (instructions/sec)  2.5  109 /1.0  2.5  109 performance of P3 (instructions/sec)  4  109 /2.2  1.8  109

Chapter 1 Solutions S-

1.8.1 C  2  DP/(V^2 *F)

Pentium 4: C  3.2E–8F Core i5 Ivy Bridge: C  2.9E–8F 1.8.2 Pentium 4: 10/100  10% Core i5 Ivy Bridge: 30/70  42.9% 1.8.3 (Snew  Dnew)/(S (^) old  D (^) old)  0. Dnew  C  Vnew 2  F S (^) old  Vold  I Snew  Vnew  I Therefore: Vnew  [Dnew/(C  F)]1/ Dnew  0.90  (S (^) old  D (^) old)  Snew Snew  Vnew  (S (^) old/Vold) Pentium 4: Snew  Vnew  (10/1.25)  Vnew  8 Dnew  0.90  100  Vnew  8  90  Vnew  8 Vnew  [(90  Vnew  8)/(3.2E8  3.6E9)]1/ Vnew  0.85 V Core i5: Snew  Vnew  (30/0.9)  Vnew  33. Dnew  0.90  70  Vnew  33.3  63  Vnew  33. Vnew  [(63  Vnew  33.3)/(2.9E8  3.4E9)]1/ Vnew  0.64 V

1.9. p # arith inst. # L/S inst. # branch inst. cycles ex. time speedup 1 2.56E9 1.28E9 2.56E8 7.94E10 39.7 1 2 1.83E9 9.14E8 2.56E8 5.67E10 28.3 1. 4 9.12E8 4.57E8 2.56E8 2.83E10 14.2 2. 8 4.57E8 2.29E8 2.56E8 1.42E10 7.10 5.

S-6 Chapter 1 Solutions

p ex. time 1 41. 2 29. 4 14. 8 7. 1.9.3 3

1.10.1 die area (^) 15cm  wafer area/dies per wafer  pi7.5^2 / 84  2.10 cm^2 yield15cm  1/(1(0.0202.10/2))^2  0. die area (^) 20cm  wafer area/dies per wafer  pi10^2 /100  3.14 cm^2 yield20cm  1/(1(0.0313.14/2))^2  0. 1.10.2 cost/die15cm  12/(840.9593)  0. cost/die20cm  15/(1000.9093)  0. 1.10.3 die area (^) 15cm  wafer area/dies per wafer  pi7.5^2 /(841.1)  1.91 cm^2 yield15cm  1/(1  (0.0201.151.91/2))^2  0. die area (^) 20cm  wafer area/dies per wafer  pi10^2 /(1001.1)  2.86 cm^2 yield20cm  1/(1  (0.031.152.86/2))^2  0. 1.10.4 defects per area0.92  (1–y^.5)/(y^.5die_area/2)  (10.92^.5)/ (0.92^.52/2)  0.043 defects/cm^2 defects per area0.95  (1–y^.5)/(y^.5die_area/2)  (10.95^.5)/ (0.95^.52/2)  0.026 defects/cm^2

1.11.1 CPI  clock rate  CPU time/instr. count clock rate  1/cycle time  3 GHz CPI(bzip2)  3  109  750/(2389  109 ) 0. 1.11.2 SPEC ratio  ref. time/execution time SPEC ratio(bzip2)  9650/750  12. 1.11.3. CPU time  No. instr.  CPI/clock rate If CPI and clock rate do not change, the CPU time increase is equal to the increase in the of number of instructions, that is 10%.

S-8 Chapter 1 Solutions

MIPS(P2)  3  109  10 ^6 /0.75  4.0  103

MIPS(P1)  MIPS(P2), performance(P1)  performance(P2) (from 11a) 1.12.4 MFLOPS  No. FP operations  10 ^6 /T MFLOPS(P1)  .4  5E9  1E-6/1.125  1.78E MFLOPS(P2)  .4  1E9  1E-6/.25  1.60E MFLOPS(P1)  MFLOPS(P2), performance(P1)  performance(P2) (from 11a)

1.13.1 Tfp  70  0.8  56 s. Tnew  56  85  55  40  236 s. Reduction: 5.6% 1.13.2 Tnew  250  0.8  200 s, TfpTl/sTbranch  165 s, Tint  35 s. Reduction time INT: 58.8% 1.13.3 Tnew  250  0.8  200 s, TfpTintTl/s  210 s. NO

1.14.1 Clock cycles  CPIfp  No. FP instr.  CPIint  No. INT instr.  CPIl/s  No. L/S instr.  CPIbranch  No. branch instr. TCPU  clock cycles/clock rate  clock cycles/2  10 9 clock cycles  512  106 ; TCPU  0.256 s To have the number of clock cycles by improving the CPI of FP instructions: CPIimproved fp  No. FP instr.  CPIint  No. INT instr.  CPI (^) l/s  No. L/S instr.  CPIbranch  No. branch instr.  clock cycles/ CPIimproved fp  (clock cycles/2  (CPIint  No. INT instr.  CPIl/s  No. L/S instr.  CPIbranch  No. branch instr.)) / No. FP instr. CPIimproved fp  (256462)/50  0  not possible 1.14.2 Using the clock cycle data from a. To have the number of clock cycles improving the CPI of L/S instructions: CPIfp  No. FP instr.  CPIint  No. INT instr.  CPIimproved l/s  No. L/S instr.  CPIbranch  No. branch instr.  clock cycles/ CPIimproved l/s  (clock cycles/2  (CPIfp  No. FP instr.  CPIint  No. INT instr.  CPIbranch  No. branch instr.)) / No. L/S instr. CPIimproved l/s  (256198)/80  0. 1.14.3 Clock cycles  CPIfp  No. FP instr.  CPIint  No. INT instr.  CPIl/s  No. L/S instr.  CPIbranch  No. branch instr.

Chapter 1 Solutions S-

TCPU  clock cycles/clock rate  clock cycles/2  10 9 CPIint  0.6  1  0.6; CPIfp  0.6  1  0.6; CPIl/s  0.7  4  2.8; CPIbranch  0.7  2  1. TCPU (before improv.)  0.256 s; TCPU (after improv.) 0.171 s

processors

exec. time/ processor

time w/overhead speedup

actual speedup/ideal speedup 1 100 (^2 50 54) 100/54  1.85 1.85/2 . (^4 25 29) 100/29  3.44 3.44/4  0. 8 12.5 16.5 100/16.5  6.06 6.06/8  0. 16 6.25 10.25 100/10.25  9.76 9.76/16  0.

Chapter 2 Solutions S-

2.1 addi f, h, -5 (note, no subi) add f, f, g

2.2 f = g + h + i

2.3 sub $t0, $s3, $s add $t0, $s6, $t lw $t1, 16($t0) sw $t1, 32($s7)

2.4 B[g] = A[f] + A[1+f];

2.5 add $t0, $s6, $s add $t1, $s7, $s lw $s0, 0($t0) lw $t0, 4($t0) add $t0, $t0, $s sw $t0, 0($t1)

2.6.1 temp = Array[0]; temp2 = Array[1]; Array[0] = Array[4]; Array[1] = temp; Array[4] = Array[3]; Array[3] = temp2;

2.6.2 lw $t0, 0($s6) lw $t1, 4($s6) lw $t2, 16($s6) sw $t2, 0($s6) sw $t0, 4($s6) lw $t0, 12($s6) sw $t0, 16($s6) sw $t1, 12($s6)

S-4 Chapter 2 Solutions

Little-Endian Big-Endian Address Data Address Data 12 ab 12 12 8 cd 8 ef 4 ef 4 cd 0 12 0 ab 2.8 2882400018 2.9 sll $t0, $s1, 2 # $t0 <-- 4g add $t0, $t0, $s7 # $t0 <-- Addr(B[g]) lw $t0, 0($t0) # $t0 <-- B[g] addi $t0, $t0, 1 # $t0 <-- B[g]+ sll $t0, $t0, 2 # $t0 <-- 4(B[g]+1) = Addr(A[B[g]+1]) lw $s0, 0($t0) # f <-- A[B[g]+1] 2.10 f = 2*(&A);

type opcode rs rt rd immed addi $t0, $s6, 4 I-type 8 22 8 4 add $t1, $s6, $0 R-type 0 22 0 9 sw $t1, 0($t0) (^) I-type 43 8 9 0 lw $t0, 0($t0) (^) I-type 35 8 8 0 add $s0, $t1, $t0 R-type 0 9 8 16

2.12.1 50000000 2.12.2 overflow 2.12.3 B 2.12.4 no overflow 2.12.5 D 2.12.6 overflow

2.13.1 128    231 1, x  231 129 and 128  x   231 , x   231  128 (impossible) 2.13.2 128  x  231 1, x   231 129 and 128  x   231 , x  2 31  128 (impossible) 2.13.3 x  128   2 31 , x   2 31  128 and x  128  2 31  1, x  2 31  127 (impossible)

S-6 Chapter 2 Solutions

2.25.1 i-type 2.25.2 addi $t2, $t2, – 1 beq $t2, $0, loop

2.26.1 20 2.26.2 i = 10; do { B += 2; i = i – 1; } while ( i > 0) 2.26.3 5*N 2.27 addi $t0, $0, 0 beq $0, $0, TEST LOOP1: addi $t1, $0, 0 beq $0, $0, TEST LOOP2: add $t3, $t0, $t sll $t2, $t1, 4 add $t2, $t2, $s sw $t3, ($t2) addi $t1, $t1, 1 TEST2: slt $t2, $t1, $s bne $t2, $0, LOOP addi $t0, $t0, 1 TEST1: slt $t2, $t0, $s bne $t2, $0, LOOP 2.28 14 instructions to implement and 158 instructions executed 2.29 for (i=0; i<100; i++) { result += MemArray[s0]; s0 = s0 + 4; }

Chapter 2 Solutions S-

2.30 addi $t1, $s0, 400 LOOP: lw $s1, 0($t1) add $s2, $s2, $s addi $t1, $t1, - bne $t1, $s0, LOOP

2.31 fib: addi $sp, $sp, -12 # make room on stack sw $ra, 8($sp) # push $ra sw $s0, 4($sp) # push $s sw $a0, 0($sp) # push $a0 (N) bgt $a0, $0, test2 # if n>0, test if n= add $v0, $0, $0 # else fib(0) = 0 j rtn # test2: addi $t0, $0, 1 # bne $t0, $a0, gen # if n>1, gen add $v0, $0, $t0 # else fib(1) = 1 j rtn gen: subi $a0, $a0,1 # n- jal fib # call fib(n-1) add $s0, $v0, $0 # copy fib(n-1) sub $a0, $a0,1 # n- jal fib # call fib(n-2) add $v0, $v0, $s0 # fib(n-1)+fib(n-2) rtn: lw $a0, 0($sp) # pop $a lw $s0, 4($sp) # pop $s lw $ra, 8($sp) # pop $ra addi $sp, $sp, 12 # restore sp jr $ra

fib(0) = 12 instructions, fib(1) = 14 instructions,

fib(N) = 26 + 18N instructions for N >=

2.32 Due to the recursive nature of the code, it is not possible for the compiler to in-line the function call.

2.33 after calling function fib: old $sp -> 0x7ffffffc ??? -4 contents of register $ra for fib(N) -8 contents of register $s0 for fib(N) $sp-> -12 contents of register $a0 for fib(N) there will be N-1 copies of $ra, $s0 and $a

Chapter 2 Solutions S-

DONE: add $v0, $s0, $ lw $ra, ($sp) addi $sp, $sp, 4 jr $ra

2.38 0x

2.39 Generally, all solutions are similar:

lui $t1, top_16_bits ori $t1, $t1, bottom_16_bits

2.40 No, jump can go up to 0x0FFFFFFC.

2.41 No, range is 0x604 + 0x1FFFC = 0x0002 0600 to 0x604 – 0x = 0xFFFE 0604.

2.42 Yes, range is 0x1FFFF004 + 0x1FFFC = 0x2001F000 to 0x1FFFF

  • 0x20000 = 1FFDF

2.43 trylk: li $t1, ll $t0,0($a0) bnez $t0,trylk sc $t1,0($a0) beqz $t1,trylk lw $t2,0($a1) slt $t3,$t2,$a bnez $t3,skip sw $a2,0($a1) skip: sw $0,0($a0) 2.44 try: ll $t0,0($a1) slt $t1,$t0,$a bnez $t1,skip mov $t0,$a sc $t0,0($a1) beqz $t0,try skip:

2.45 It is possible for one or both processors to complete this code without ever reaching the SC instruction. If only one executes SC, it completes successfully. If both reach SC, they do so in the same cycle, but one SC completes first and then the other detects this and fails.

S-10 Chapter 2 Solutions

2.46.1 Answer is no in all cases. Slows down the computer. CCT  clock cycle time ICa  instruction count (arithmetic) ICls  instruction count (load/store) ICb  instruction count (branch) new CPU time  0.75old ICaCPIa1.1oldCCT  oldIClsCPIls1.1oldCCT  oldICbCPIb1.1oldCCT The extra clock cycle time adds sufficiently to the new CPU time such that it is not quicker than the old execution time in all cases. 2.46.2 107.04%, 113.43%

2.47.1 2. 2.47.2 0. 2.47.3 0.

Solutions

Chapter 3 Solutions S-

The attraction is that each hex digit contains one of 16 different characters (0–9, A–E). Since with 4 binary bits you can represent 16 different patterns, in hex each digit requires exactly 4 binary bits. And bytes are by definition 8 bits long, so two hex digits are all that are required to represent the contents of 1 byte. 3.4 753 3.5 7777 (3777) 3.6 Neither (63) 3.7 Neither (65) 3.8 Overflow (result  179, which does not fit into an SM 8-bit format) 3.9  105  42  128 (147) 3.10  105  42   63 3.11 151  214  255 (365) 3.12 62  12 Step Action Multiplier Multiplicand Product 0 Initial Vals 001 010 000 000 110 010 000 000 000 000 lsb=0, no op 001 010 000 000 110 010 000 000 000 000 1 Lshift^ Mcand^001 010 000 001 100 100^ 000 000 000 000 Rshift Mplier 000 101 000 001 100 100 000 000 000 000 Prod=Prod+Mcand 000 101 000 001 100 100 000 001 100 100 2 Lshift Mcand 000 101 000 011 001 000 000 001 100 100 Rshift Mplier 000 010 000 011 001 000 000 001 100 100 lsb=0, no op 000 010 000 011 001 000 000 001 100 100 3 Lshift^ Mcand^000 010 000 110 010 000^ 000 001 100 100 Rshift Mplier 000 001 000 110 010 000 000 001 100 100 Prod=Prod+Mcand 000 001 000 110 010 000 000 111 110 100 4 Lshift Mcand 000 001 001 100 100 000 000 111 110 100 Rshift Mplier 000 000 001 100 100 000 000 111 110 100 lsb=0, no op 000 000 001 100 100 000 000 111 110 100 5 Lshift Mcand 000 000 011 001 000 000 000 111 110 100 Rshift Mplier 000 000 011 001 000 000 000 111 110 100 lsb=0, no op 000 000 110 010 000 000 000 111 110 100 6 Lshift Mcand 000 000 110 010 000 000 000 111 110 100 Rshift Mplier 000 000 110 010 000 000 000 111 110 100