Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

MIPS documents for Computer architecture, Exercises of Computer Science

Birla Institute of Technology and Science Computer Science

Includes solution to textbook by Patterson

Typology: Exercises

2018/2019

Uploaded on 11/29/2019

l-srihari 🇮🇳

4.5

(2)

4 documents

1 / 80

This page cannot be seen from the preview

Don't miss anything!

Solutions

Discover Exercises of Computer Science Birla Institute of Technology and Science

Partial preview of the text

Download MIPS documents for Computer architecture and more Exercises Computer Science in PDF only on Docsity!

Solutions

Chapter 1 Solutions S-

1.1 Personal computer (includes workstation and laptop): Personal computers emphasize delivery of good performance to single users at low cost and usually execute third-party software. Personal mobile device (PMD, includes tablets): PMDs are battery operated with wireless connectivity to the Internet and typically cost hundreds of dollars, and, like PCs, users can download software (“apps”) to run on them. Unlike PCs, they no longer have a keyboard and mouse, and are more likely to rely on a touch-sensitive screen or even speech input. Server: Computer used to run large problems and usually accessed via a network. Warehouse scale computer: Thousands of processors forming a large cluster. Supercomputer: Computer composed of hundreds to thousands of processors and terabytes of memory. Embedded computer: Computer designed to run one application or one set of related applications and integrated into a single system.

a. Performance via Pipelining b. Dependability via Redundancy c. Performance via Prediction d. Make the Common Case Fast e. Hierarchy of Memories f. Performance via Parallelism g. Design for Moore’s Law h. Use Abstraction to Simplify Design

1.3 The program is compiled into an assembly language program, which is then assembled into a machine language program.

a. 1280 1024 pixels 1,310,720 pixels 1,310,720 3 3,932, bytes/frame. b. 3,932,160 bytes (8 bits/byte) /100E6 bits/second 0.31 seconds

a. performance of P1 (instructions/sec) 3 109 /1.5 2 109 performance of P2 (instructions/sec) 2.5 109 /1.0 2.5 109 performance of P3 (instructions/sec) 4 109 /2.2 1.8 109

Chapter 1 Solutions S-

1.8.1 C 2 DP/(V^2 *F)

Pentium 4: C 3.2E–8F Core i5 Ivy Bridge: C 2.9E–8F 1.8.2 Pentium 4: 10/100 10% Core i5 Ivy Bridge: 30/70 42.9% 1.8.3 (Snew Dnew)/(S (^) old D (^) old) 0. Dnew C Vnew 2 F S (^) old Vold I Snew Vnew I Therefore: Vnew [Dnew/(C F)]1/ Dnew 0.90 (S (^) old D (^) old) Snew Snew Vnew (S (^) old/Vold) Pentium 4: Snew Vnew (10/1.25) Vnew 8 Dnew 0.90 100 Vnew 8 90 Vnew 8 Vnew [(90 Vnew 8)/(3.2E8 3.6E9)]1/ Vnew 0.85 V Core i5: Snew Vnew (30/0.9) Vnew 33. Dnew 0.90 70 Vnew 33.3 63 Vnew 33. Vnew [(63 Vnew 33.3)/(2.9E8 3.4E9)]1/ Vnew 0.64 V

1.9. p # arith inst. # L/S inst. # branch inst. cycles ex. time speedup 1 2.56E9 1.28E9 2.56E8 7.94E10 39.7 1 2 1.83E9 9.14E8 2.56E8 5.67E10 28.3 1. 4 9.12E8 4.57E8 2.56E8 2.83E10 14.2 2. 8 4.57E8 2.29E8 2.56E8 1.42E10 7.10 5.

S-6 Chapter 1 Solutions

p ex. time 1 41. 2 29. 4 14. 8 7. 1.9.3 3

1.10.1 die area (^) 15cm wafer area/dies per wafer pi7.5^2 / 84 2.10 cm^2 yield15cm 1/(1(0.0202.10/2))^2 0. die area (^) 20cm wafer area/dies per wafer pi10^2 /100 3.14 cm^2 yield20cm 1/(1(0.0313.14/2))^2 0. 1.10.2 cost/die15cm 12/(840.9593) 0. cost/die20cm 15/(1000.9093) 0. 1.10.3 die area (^) 15cm wafer area/dies per wafer pi7.5^2 /(841.1) 1.91 cm^2 yield15cm 1/(1 (0.0201.151.91/2))^2 0. die area (^) 20cm wafer area/dies per wafer pi10^2 /(1001.1) 2.86 cm^2 yield20cm 1/(1 (0.031.152.86/2))^2 0. 1.10.4 defects per area0.92 (1–y^.5)/(y^.5die_area/2) (10.92^.5)/ (0.92^.52/2) 0.043 defects/cm^2 defects per area0.95 (1–y^.5)/(y^.5die_area/2) (10.95^.5)/ (0.95^.52/2) 0.026 defects/cm^2

1.11.1 CPI clock rate CPU time/instr. count clock rate 1/cycle time 3 GHz CPI(bzip2) 3 109 750/(2389 109 ) 0. 1.11.2 SPEC ratio ref. time/execution time SPEC ratio(bzip2) 9650/750 12. 1.11.3. CPU time No. instr. CPI/clock rate If CPI and clock rate do not change, the CPU time increase is equal to the increase in the of number of instructions, that is 10%.

S-8 Chapter 1 Solutions

MIPS(P2) 3 109 10 ^6 /0.75 4.0 103

MIPS(P1) MIPS(P2), performance(P1) performance(P2) (from 11a) 1.12.4 MFLOPS No. FP operations 10 ^6 /T MFLOPS(P1) .4 5E9 1E-6/1.125 1.78E MFLOPS(P2) .4 1E9 1E-6/.25 1.60E MFLOPS(P1) MFLOPS(P2), performance(P1) performance(P2) (from 11a)

1.13.1 Tfp 70 0.8 56 s. Tnew 56 85 55 40 236 s. Reduction: 5.6% 1.13.2 Tnew 250 0.8 200 s, TfpTl/sTbranch 165 s, Tint 35 s. Reduction time INT: 58.8% 1.13.3 Tnew 250 0.8 200 s, TfpTintTl/s 210 s. NO

1.14.1 Clock cycles CPIfp No. FP instr. CPIint No. INT instr. CPIl/s No. L/S instr. CPIbranch No. branch instr. TCPU clock cycles/clock rate clock cycles/2 10 9 clock cycles 512 106 ; TCPU 0.256 s To have the number of clock cycles by improving the CPI of FP instructions: CPIimproved fp No. FP instr. CPIint No. INT instr. CPI (^) l/s No. L/S instr. CPIbranch No. branch instr. clock cycles/ CPIimproved fp (clock cycles/2 (CPIint No. INT instr. CPIl/s No. L/S instr. CPIbranch No. branch instr.)) / No. FP instr. CPIimproved fp (256462)/50 0 not possible 1.14.2 Using the clock cycle data from a. To have the number of clock cycles improving the CPI of L/S instructions: CPIfp No. FP instr. CPIint No. INT instr. CPIimproved l/s No. L/S instr. CPIbranch No. branch instr. clock cycles/ CPIimproved l/s (clock cycles/2 (CPIfp No. FP instr. CPIint No. INT instr. CPIbranch No. branch instr.)) / No. L/S instr. CPIimproved l/s (256198)/80 0. 1.14.3 Clock cycles CPIfp No. FP instr. CPIint No. INT instr. CPIl/s No. L/S instr. CPIbranch No. branch instr.

Chapter 1 Solutions S-

TCPU clock cycles/clock rate clock cycles/2 10 9 CPIint 0.6 1 0.6; CPIfp 0.6 1 0.6; CPIl/s 0.7 4 2.8; CPIbranch 0.7 2 1. TCPU (before improv.) 0.256 s; TCPU (after improv.) 0.171 s

processors

exec. time/ processor

time w/overhead speedup

actual speedup/ideal speedup 1 100 (^2 50 54) 100/54 1.85 1.85/2 . (^4 25 29) 100/29 3.44 3.44/4 0. 8 12.5 16.5 100/16.5 6.06 6.06/8 0. 16 6.25 10.25 100/10.25 9.76 9.76/16 0.

Chapter 2 Solutions S-

2.1 addi f, h, -5 (note, no subi) add f, f, g

2.2 f = g + h + i

2.3 sub $t0, $s3, $s add $t0, $s6, $t lw $t1, 16($t0) sw $t1, 32($s7)

2.4 B[g] = A[f] + A[1+f];

2.5 add $t0, $s6, $s add $t1, $s7, $s lw $s0, 0($t0) lw $t0, 4($t0) add $t0, $t0, $s sw $t0, 0($t1)

2.6.1 temp = Array[0]; temp2 = Array[1]; Array[0] = Array[4]; Array[1] = temp; Array[4] = Array[3]; Array[3] = temp2;

2.6.2 lw $t0, 0($s6) lw $t1, 4($s6) lw $t2, 16($s6) sw $t2, 0($s6) sw $t0, 4($s6) lw $t0, 12($s6) sw $t0, 16($s6) sw $t1, 12($s6)

S-4 Chapter 2 Solutions

Little-Endian Big-Endian Address Data Address Data 12 ab 12 12 8 cd 8 ef 4 ef 4 cd 0 12 0 ab 2.8 2882400018 2.9 sll $t0, $s1, 2 # $t0 <-- 4g add $t0, $t0, $s7 # $t0 <-- Addr(B[g]) lw $t0, 0($t0) # $t0 <-- B[g] addi $t0, $t0, 1 # $t0 <-- B[g]+ sll $t0, $t0, 2 # $t0 <-- 4(B[g]+1) = Addr(A[B[g]+1]) lw $s0, 0($t0) # f <-- A[B[g]+1] 2.10 f = 2*(&A);

type opcode rs rt rd immed addi $t0, $s6, 4 I-type 8 22 8 4 add $t1, $s6, $0 R-type 0 22 0 9 sw $t1, 0($t0) (^) I-type 43 8 9 0 lw $t0, 0($t0) (^) I-type 35 8 8 0 add $s0, $t1, $t0 R-type 0 9 8 16

2.12.1 50000000 2.12.2 overflow 2.12.3 B 2.12.4 no overflow 2.12.5 D 2.12.6 overflow

2.13.1 128 231 1, x 231 129 and 128 x 231 , x 231 128 (impossible) 2.13.2 128 x 231 1, x 231 129 and 128 x 231 , x 2 31 128 (impossible) 2.13.3 x 128 2 31 , x 2 31 128 and x 128 2 31 1, x 2 31 127 (impossible)

S-6 Chapter 2 Solutions

2.25.1 i-type 2.25.2 addi $t2, $t2, – 1 beq $t2, $0, loop

2.26.1 20 2.26.2 i = 10; do { B += 2; i = i – 1; } while ( i > 0) 2.26.3 5*N 2.27 addi $t0, $0, 0 beq $0, $0, TEST LOOP1: addi $t1, $0, 0 beq $0, $0, TEST LOOP2: add $t3, $t0, $t sll $t2, $t1, 4 add $t2, $t2, $s sw $t3, ($t2) addi $t1, $t1, 1 TEST2: slt $t2, $t1, $s bne $t2, $0, LOOP addi $t0, $t0, 1 TEST1: slt $t2, $t0, $s bne $t2, $0, LOOP 2.28 14 instructions to implement and 158 instructions executed 2.29 for (i=0; i<100; i++) { result += MemArray[s0]; s0 = s0 + 4; }

Chapter 2 Solutions S-

2.30 addi $t1, $s0, 400 LOOP: lw $s1, 0($t1) add $s2, $s2, $s addi $t1, $t1, - bne $t1, $s0, LOOP

2.31 fib: addi $sp, $sp, -12 # make room on stack sw $ra, 8($sp) # push $ra sw $s0, 4($sp) # push $s sw $a0, 0($sp) # push $a0 (N) bgt $a0, $0, test2 # if n>0, test if n= add $v0, $0, $0 # else fib(0) = 0 j rtn # test2: addi $t0, $0, 1 # bne $t0, $a0, gen # if n>1, gen add $v0, $0, $t0 # else fib(1) = 1 j rtn gen: subi $a0, $a0,1 # n- jal fib # call fib(n-1) add $s0, $v0, $0 # copy fib(n-1) sub $a0, $a0,1 # n- jal fib # call fib(n-2) add $v0, $v0, $s0 # fib(n-1)+fib(n-2) rtn: lw $a0, 0($sp) # pop $a lw $s0, 4($sp) # pop $s lw $ra, 8($sp) # pop $ra addi $sp, $sp, 12 # restore sp jr $ra

fib(0) = 12 instructions, fib(1) = 14 instructions,

fib(N) = 26 + 18N instructions for N >=

2.32 Due to the recursive nature of the code, it is not possible for the compiler to in-line the function call.

2.33 after calling function fib: old $sp -> 0x7ffffffc ??? -4 contents of register $ra for fib(N) -8 contents of register $s0 for fib(N) $sp-> -12 contents of register $a0 for fib(N) there will be N-1 copies of $ra, $s0 and $a

Chapter 2 Solutions S-

DONE: add $v0, $s0, $ lw $ra, ($sp) addi $sp, $sp, 4 jr $ra

2.38 0x

2.39 Generally, all solutions are similar:

lui $t1, top_16_bits ori $t1, $t1, bottom_16_bits

2.40 No, jump can go up to 0x0FFFFFFC.

2.41 No, range is 0x604 + 0x1FFFC = 0x0002 0600 to 0x604 – 0x = 0xFFFE 0604.

2.42 Yes, range is 0x1FFFF004 + 0x1FFFC = 0x2001F000 to 0x1FFFF

0x20000 = 1FFDF

2.43 trylk: li $t1, ll $t0,0($a0) bnez $t0,trylk sc $t1,0($a0) beqz $t1,trylk lw $t2,0($a1) slt $t3,$t2,$a bnez $t3,skip sw $a2,0($a1) skip: sw $0,0($a0) 2.44 try: ll $t0,0($a1) slt $t1,$t0,$a bnez $t1,skip mov $t0,$a sc $t0,0($a1) beqz $t0,try skip:

2.45 It is possible for one or both processors to complete this code without ever reaching the SC instruction. If only one executes SC, it completes successfully. If both reach SC, they do so in the same cycle, but one SC completes first and then the other detects this and fails.

S-10 Chapter 2 Solutions

2.46.1 Answer is no in all cases. Slows down the computer. CCT clock cycle time ICa instruction count (arithmetic) ICls instruction count (load/store) ICb instruction count (branch) new CPU time 0.75old ICaCPIa1.1oldCCT oldIClsCPIls1.1oldCCT oldICbCPIb1.1oldCCT The extra clock cycle time adds sufficiently to the new CPU time such that it is not quicker than the old execution time in all cases. 2.46.2 107.04%, 113.43%

2.47.1 2. 2.47.2 0. 2.47.3 0.

Solutions

Chapter 3 Solutions S-

The attraction is that each hex digit contains one of 16 different characters (0–9, A–E). Since with 4 binary bits you can represent 16 different patterns, in hex each digit requires exactly 4 binary bits. And bytes are by definition 8 bits long, so two hex digits are all that are required to represent the contents of 1 byte. 3.4 753 3.5 7777 (3777) 3.6 Neither (63) 3.7 Neither (65) 3.8 Overflow (result 179, which does not fit into an SM 8-bit format) 3.9 105 42 128 (147) 3.10 105 42 63 3.11 151 214 255 (365) 3.12 62 12 Step Action Multiplier Multiplicand Product 0 Initial Vals 001 010 000 000 110 010 000 000 000 000 lsb=0, no op 001 010 000 000 110 010 000 000 000 000 1 Lshift^ Mcand^001 010 000 001 100 100^ 000 000 000 000 Rshift Mplier 000 101 000 001 100 100 000 000 000 000 Prod=Prod+Mcand 000 101 000 001 100 100 000 001 100 100 2 Lshift Mcand 000 101 000 011 001 000 000 001 100 100 Rshift Mplier 000 010 000 011 001 000 000 001 100 100 lsb=0, no op 000 010 000 011 001 000 000 001 100 100 3 Lshift^ Mcand^000 010 000 110 010 000^ 000 001 100 100 Rshift Mplier 000 001 000 110 010 000 000 001 100 100 Prod=Prod+Mcand 000 001 000 110 010 000 000 111 110 100 4 Lshift Mcand 000 001 001 100 100 000 000 111 110 100 Rshift Mplier 000 000 001 100 100 000 000 111 110 100 lsb=0, no op 000 000 001 100 100 000 000 111 110 100 5 Lshift Mcand 000 000 011 001 000 000 000 111 110 100 Rshift Mplier 000 000 011 001 000 000 000 111 110 100 lsb=0, no op 000 000 110 010 000 000 000 111 110 100 6 Lshift Mcand 000 000 110 010 000 000 000 111 110 100 Rshift Mplier 000 000 110 010 000 000 000 111 110 100

MIPS documents for Computer architecture, Exercises of Computer Science

Related documents

Partial preview of the text

Download MIPS documents for Computer architecture and more Exercises Computer Science in PDF only on Docsity!

Solutions

1.8.1 C 2 DP/(V^2 *F)

MIPS(P2) 3 109 10 ^6 /0.75 4.0 103

fib(0) = 12 instructions, fib(1) = 14 instructions,

fib(N) = 26 + 18N instructions for N >=

Solutions

MIPS(P2) 3 109 10 ^6 /0.75 4.0 103