2005 CS61C Final Exam Answers and Analysis, Exams of Data Structures and Algorithms

Answers and explanations for the 2005 cs61c final exam, covering topics such as number representations, memory management, mips assembly, datapath, cache and virtual memory, and potpourri. It includes detailed calculations, code snippets, and diagrams to help understand the concepts and problem-solving strategies.

Typology: Exams

2012/2013

Uploaded on 04/02/2013

shaje_69kinky
shaje_69kinky 🇮🇳

4.7

(26)

76 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
2005Fa CS61C Final Exam Answers
[not to leave 385 Soda]
M1:Numbers
a) Overall bit patterns? 232 = 4,294,967,296 (the exact # is not required; roughly a bit more than
4,000,000,000)
How many encode a valid BCD? 8 decimal digits, so 108 = 100,000,000
Ratio is 232/108 = 42.94967296 40 (to one significant figure).
b) Each pixel is independent, and there are 4x8=32=25 of them, so it’s 432 = (22)32 = 264
= 16 exbi images.
c) Comparing floats using signed int compare, huh? The relative ordering of all positive
numbers is the same (increasing from 0 to max_positive) for both encodings, so comparing two
positive floats with signed compare works. Also, for both encodings the bit patterns for
negative numbers all start with a leading 1 (0x80000000 through 0xFFFFFFFF) so comparing a
negative float with a positive float using signed int compare will also yield the correct
answer. However, when comparing two negative floats, the sign-magnitude nature of floats
means that as we increase the bit patterns from (0x80000000 through 0xFFFFFFFF) floats
move from 0 toward -, but signed ints move the other way from - (-231, really) toward 0.
Thus, we will get an incorrect answer when comparing two different negative numbers.
d) Put the corresponding letters for each 32-bit value in order from least to greatest:
A. 0xF0000000 (IEEE float) = - huge
B. 0xF0000000 (2's complement) = -231 + 230 + 229 + 228
C. 0xF0000000 (sign-magnitude) = -(231 – 228) = -231 + 228
D. 0xFFFFFFFF (2's complement) = -1
E. 0xFFFFFFFF (1's complement) = -0
F. 0xF1000000 (IEEE float) = - huger
G. 0x70000000 (IEEE float) = + huge
H. 0x7FFFFFFF (2's complement) = 231 - 1
I. 0x80000010 (IEEE float) = - small denorm (value doesn’t matter)
f, a, c, b, d, i, e, h, g
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download 2005 CS61C Final Exam Answers and Analysis and more Exams Data Structures and Algorithms in PDF only on Docsity!

2005Fa CS61C Final Exam Answers

[not to leave 385 Soda]

M1:Numbers

a) Overall bit patterns? 2

32

= 4,294,967,296 (the exact # is not required; roughly a bit more than

How many encode a valid BCD? 8 decimal digits, so 10^8 = 100,000,

Ratio is 2^32 /10^8 = 42.94967296 ≈ 40 (to one significant figure).

b) Each pixel is independent, and there are 4x8=32=

5

of them, so it’s 4

32

2

32

64

= 16 exbi images.

c) Comparing floats using signed int compare, huh? The relative ordering of all positive

numbers is the same (increasing from 0 to max_positive) for both encodings, so comparing two

positive floats with signed compare works. Also, for both encodings the bit patterns for

negative numbers all start with a leading 1 (0x80000000 through 0xFFFFFFFF) so comparing a

negative float with a positive float using signed int compare will also yield the correct

answer. However, when comparing two negative floats, the sign-magnitude nature of floats

means that as we increase the bit patterns from (0x80000000 through 0xFFFFFFFF) floats

move from 0 toward - ∞, but signed ints move the other way from - ∞ (- 2

31

, really) toward 0.

Thus, we will get an incorrect answer when comparing two different negative numbers.

d) Put the corresponding letters for each 32-bit value in order from least to greatest:

A. 0xF0000000 (IEEE float) = - huge

B. 0xF0000000 (2's complement) = - 231 + 2^30 + 2^29 + 2^28

C. 0xF0000000 (sign-magnitude) = - ( 231 – 228 ) = - 231 + 2^28

D. 0xFFFFFFFF (2's complement) = - 1

E. 0xFFFFFFFF (1's complement) = - 0

F. 0xF1000000 (IEEE float) = - huger

G. 0x70000000 (IEEE float) = + huge

H. 0x7FFFFFFF (2's complement) = 2^31 - 1

I. 0x80000010 (IEEE float) = - small denorm (value doesn’t matter)

f, a, c, b, d, i, e, h, g

M2:C

a) static stack heap

1: 4+16+4+4=28 B 0 0

3: 0 282 + 24 = 64 B 0

4 : 0 0 280 B

b) Two solutions…Longest (with the best style)

int Delete (slicenode_t plan) { if(plan->type == RECTANGLE) { / leaf / free(plan); return(1); } else { if plan->type == CUT) { / inner node */ **slidenode_t L, R; L = plan->L; R = plan->R; free(plan); return(1+Delete(L)+Delete(R)); } else { printf("Delete(): plan->type was %d, expected %d or %d", plan->type, RECTAN CUT); exit(1); } } }

… and longest!

int Delete (slicenode_t plan) { if(plan->type == RECTANGLE) { / leaf */ **free(plan); return(1); } else { slidenode_t L, R; L = plan->L; R = plan->R; free(plan); return(1+Delete(L)+Delete(R)); } }

… and shortest!

int Delete (slicenode_t plan) { if(plan->type == RECTANGLE) { return(1+(0free(plan))); } else { /* inner node / return(1+Delete(plan->L)+Delete(plan->R)+(0free(plan))); } }

F1:Datapath

srjr $ra, $sp, 16

a) R[rt] = R[rt] + (ZeroExt(Imm) << 2); PC = R[rs]

b) 256 kibi (16 unsigned 0xFFFF bits of words = 18 unsigned bytes)

c)

i. Add mux so Ra input is sometimes Rs, sometimes Rt, call the control signal RegSrc

ii. Modify Extender so that it can do a “ZeroShiftExtend”, widen ExtOp control line

d)

  • RedDst=rt (0)
  • RegWr= 1
  • nPC_sel=Jump
  • ExtOp=ZeroShiftExtend
  • ALUSrc=Extender (1)
  • ALUctr=ADD
  • MemWr= 0
  • MemtoReg=ALU (0)
  • [NEW]RegSrc=Rt

F2:Cache/VM

a) With 8-byte blocks (3 bits for offset) and a fully associative cache (0 bits for index), and a MIPS

machine (32-bit addresses), we have 29:0:

b) The cache size, or “area” is the “height” (128 = 2

7

blocks) times the “width” (8 B/block = 2

3

B/block),

which is 2^10 bytes i.e., 1 KibiByte.

c) To minimize cache misses, we should never stride so far that the initial sum loop couldn’t fit

entirely into the 128-entry cache. So the farthest we could stride is the entire size of the cache ,

or 1 KiB. Any stride smaller than that will also have the property that the sum loop has a miss

for each block but the product loop has all hits.

d) Each block loaded by sum will be a miss, but product’s requests will be all hits. If the stride is

1 KibiB, that’s 2

7

misses for each of the outer loop iterations, and there are 4 MiB / 1 KiB (= 4

Ki) = 2

12

iterations. So that’s 2

7

misses/iteration * 2

12

iterations = 2

19

misses = 512 KiMisses.

e) If we are not page-aligned, what will happen is that the last page request for sum will kick the

first page out. As a result, the first page request for product won’t be there, and we’ll be

charged a page miss! The problem propagates, unfortunately. That is, the first product page we

just loaded will kick the second sum page out (since it’s the last to be accessed) so when the

second product page comes by it will also have a miss! So basically you have no cache savings

at all! However, there’s a small detail. When we shift our machinery down another stride and

increment i (which we do a total of 2

12

times), the last block loaded by the product loop will

be the first one requested by the sum loop! (But that’s a really small detail) So the answer is

simple – with block alignment every sum is a miss and every product is a hit. Without block

alignment every sum is a miss AND every product is a miss. Thus we double our misses.

f) Sure, Virtual and Physical address widths are independent. The VA width how much (virtual)

memory our program thinks we have (here 32-bits of it) and PA controls how much resident

memory we have, a completely independent quantity.

g) Sure, because we could have more resident pages (for the OS, other processes [either ours or

another users]) and there’d be less thrashing.

h) The and instruction was the last instruction in the page and the or is the first instruction in the

next page and we just experienced a page fault! Since pages are 4 KiB, the last instruction must

have the LS bits be page size – 4B, so we know the last three nibbles (offset) are 0XFFC. (i.e.,

one more instruction and the offset becomes 0x000 – a new page!)

F4:SDS

F4a) From S00 we have two transition possibilities, I=0 and I=1. I’ve felt it useful to think about

the past values I(t-2), I(t-1) and I(t) to figure out where to go. This is a simple box (a shift

register) that keeps the last two values in the state variables Sx and Sy. Every step we output ~Sx +

~Sy + ~I = ~P1 + ~P0 + ~I. Also every step N1=P0, N0=I. We don’t even need a truth table to

know this – it’s part of the definition of Sxy.

PP I OO NN (Input/Output label for edge) [#ZI(ABC) = NumberOfZerosIn(P1,P0,I)] 10 10 10

S00 0  11 S00 (0/ 3 ) # Had two 0s, another one means we stay here and output #ZI(000)= S00 1  10 S01 (1/ 2 ) # This is our first 1 in a while, register we’ve seen a 1 by

setting I(t-1) to 1 (i.e., S01) and output #ZI(001)=

S01 0  10 S10 (0/ 2 ) # Saw a 01 before but this 0 means we goto S10 and output #ZI(010)= 2 S01 1  01 S11 (1/ 1 ) # This is the 2 nd^1 in a row, go to S11 and output #ZI(011)= 1 S10 0  10 S00 (0/ 2 ) # Saw a 1 2 timesteps ago, nothing since. Goto S00,output #ZI(100)= S10 1  01 S01 (1/ 1 ) # Saw a 1 2 timesteps ago, a 1 now. Goto 01, output #ZI(101)= S11 0  01 S10 (0/ 1 ) # Saw 2 straight 1s, now a 0. Goto S10, output #ZI(110)= S11 1  00 S11 (1/ 0 ) # Everything is coming up 1s! Stay here (in S11), output #ZI(111)= 0

F4b)

Fully reduced expressions for O1,O0 and N1,N0, huh? Well, some are easier than others. We’ll do the

easier ones first. Looking at the truth table (not doing the mindless sum-of-products calculation), we

see:

N0=I

N1=P

Which we already knew from part (a)! There are no names for these circuits. Let’s now look at O

and O0. If we’re extremely clever, we remember the two bit patterns for an adder’s two output bits:

O1 is a minority circuit and O0 is a 3-input xnor. Let’s see if we can figure that out even if we don’t

remember these facts. Let’s study the truth table and look at the negative spaces (the times when the

output is zero). We see when P1 is 0 O0 looks like xnor(P0,I) = ~(P0 ⊕ I). When P1 is 1 O0 looks

xor(P0,I) = (P0 ⊕ I). That is, P0⊕I is being conditionally inverted by P1, which is what an xor

does! From this, we see that

O0 = ~[P1⊕(PO⊕I)], i.e. the post-negation of two cascaded xors, which is the same as a 3-input

xnor!

O1 is a little harder. We can still study the table and see some patterns. That is, when P1 = 0 , O

looks like nand(P0,I) = ~(P0*I). When P1= 1 , O1 is like a nor(P0,I) = ~(P0+I). This yields

S00 S

S10 S

0/1^ 1/

__ ____ ____

O1 = P1(P0I) + P1*(P0+I)

__ __ _ __ _

= P1(PO+I) + P1(P0I)* # DeMorgan’s law __ __ __ _ __ _ = P1 P0 + P1 I + P1 P0 I # distribution

Now it might look like this is minimal, but we can check two ways that it’s not. First, there’s

symmetry to the bit patterns (the expression is true whenever at least two of the three components

P1,P0 or I are false) BUT there’s not symmetry to the expression. Also, we can see that ~P0~I

yields a 1 in O1 independent of P1 from the truth table. We can also do some funky Boolean

algebra…

Recall the following distributive + law-of-1s + identity simplification?

A+AB = A(1+B) = A(1) = A

Well, we can run it backwards. That is, we can start with A and generate A+AB.

We do that here with ~PI~P0:

__ __ __ __ __ __ _ __ __ __ __ _

P1 P0 = P1 P0(1) = P1 P0(1+I) = P1 P0 + P1 P0 I

So that means our three terms for O1 are now four :

__ __ __ _ __ _

O1 = P1 P0 + P1 I + P1 P0 I # from above __ __ __ _ __ _ __ __ _ O1 = P1 P0 + P1 I + P1 P0 I + P1 P0 I # distributive+law-of-1s+identity __ __ __ _ __ __ _ O1 = P1 P0 + P1 I + (P1+P1)P0 I # distribution __ __ __ _ __ _ O1 = P1 P 0 + P1 I + ( 1 )P0 I # complementarity __ __ __ _ __ _ O1 = P1 P0 + P1 I + P0 I # identity __________________ O1 = (P1P0 + P1I + P0I) # lots more Boolean algebra!

…a NotMajority, or AntiMajority, or Minority circuit!

We could also do this the standard plug-and-chug SoP (sum-of-products) way:

__ __ _ __ __ __ _ __ _

O1 = P1 P0 I + P1 P0 I + P1 P0 I + P1 P0 I # sum-of-products __ __ _ __ __ __ __ _ __ _ __ __ _ __ _ O1 = P1 P0 I + P1 P0 I + P1 P0 I + P1 P0 I + P1 P0 I + P1 P0 I # rev idempotent, commutativity __ __ _ __ _ __ __ _ __ O1 = P1 P0(I+I) + P1 I(P0+P0) + P0 I(P1+P1) # commutativity, rev distrib __ __ __ _ __ _ O1 = P1 P0( 1 ) + P1 I( 1 ) + P0 I( 1 ) # complementarity __ __ __ _ __ _ O1 = P1 P0 + P1 I + P0 I # identity __________________ O1 = (P1P0 + P1I + P0I) # lots more Boolean algebra!

…a NotMajority, or AntiMajority, or Minority circuit!

F5:Potpourri

a) One of 9. The lesson was [debug & test rigorously as if lives depend, expect the unexpected.

Design with failure as a possibility. Add redundancy]

1. Mariner I space probe

2. Soviet gas pipeline

3. Buffer overflow in Unix finger daemon

4. Kerberos Random # generator

5. AT&T network outage

6. Intel Pentium floating pt

7. Ping of death

8. Ariane 5 Flight 501

9. National Cancer Institute

b) SPUR: Security, Privacy, Usability, Reliability

c) What are the constraints on the timing?

To maintain tsetup time constraints (and starting from the rising edge of a clock), we have the

usual equation that helps us determine how fast we can run the clock. This is that the signal,

from when it leaves the FF, goes through all the gates, until it comes around again, has to

arrive on the inputs earlier than tsetup before the next clock rising edge. Thus, we have:

tclk-to-q, + tinverter + tor + tsetup < tclock

... and to maintain thold time constraints, which state that the signal cannot get back around

and change before (less time) the hold time thold has passed, yields the following constraint:

tclk-to-q, + tinverter + tor > thold

So, isolating for tinverter in both of these inequalities yields the constraints:

thold – (tclk-to-q, + tor) < tinverter < tclock – (tclk-to-q, + tor + tsetup)

d) How fast do branches for B need to be? Well, let’s figure out the equations:

CPUtimeA = CPUtimeB [1]

But we know the equation for CPUtime as:

CPUtime = InstructionCount * CPI * ClockTime [2]

So substituting that into [1] gives us

InstructionCountA * CPIA * ClockTimeA = InstructionCountB * CPIB * ClockTimeB [3]

But since it’s the same program,

InstructionCountA = InstructionCountB [4]

Equation [3] now simplifies to:

CPIA * ClockTimeA = CPIB * ClockTimeB [5]

And substituting

ClockTimei = 1/ClockFreqi [6]

into [5] gives

CPIA / ClockFreqA = CPIB / ClockFreqB [7]

So solving for CPIB:

CPIB = ClockFreqB / ClockFreqA (CPIA) [8]

CPIB = 4/2 CPIA= 2 * CPIA

So now we only have to solve for CPIA and CPIB from the table:

CPIA = 2(2/10) + 2(3/10) + 2(5/10) = (4+6+10)/10 = 20/10 = 2 cycles/instruction

CPIB = 1(2/10) + 1(3/10) + X(5/10) = (2+3+5X)/10 = (5+5X)/10 = 4 cycles/instruction

Solving for X yields

5+5X=40  5X=35  X = 7

e) lg(16 exbi) = lg(

64

) = 64 bits total. lg(12 10 x

10

) = lg(

14

) = 14 MSBs. lg(200,000,000) =

lg(

28

) = 28 LSBs. Therefore we have 64- 14 - 28=50-28=22 bits left, which can encode 4

mebithings.

f) How much could we store? Well, here is the standard equation:

Capacity(B) = Density(B/in

2

) * Area/Surface (in

2

/Surf) * SurfacesPerPlatter (Surf/Plat) *

#Platters (Plat)

And we’re given

Density =? B/in^2

#Platters = 4 Plat

SurfacesPerPlatter = 2 Surf/Plat (if we want to maximize capacity, we use BOTH sides!)

Area/Surface = area of the disk = π r

2

outer –^ π^ r

2

inner =^ π^ (30/π^ –^ 22/π) = 8 in

2

/Surf

Thus,? Gibi (B/in^2 ) * 8 (in^2 /Surf) * 2 (Surf/Plat) * 4 (Plat) =? 2^32122 B = 2^40 B = 1

TebiByte, so? = 2^24 B/in^2 = 16 GiB/in^2

g) 0: 32 TebiB, 1: 16 TebiB, 3: 31 TebiB, 5: 31 TebiB