









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The last page is a reference sheet. Please detach it from the rest of the exam. • This exam is closed book and closed notes (no laptops, ...
Typology: Slides
1 / 16
This page cannot be seen from the preview
Don't miss anything!










Winter 2018 Instructor: Mark Wyse March 14 , 2018
Last Name: (^) Solutions
First Name:
UW Student ID Number:
UW NetID (username):
Academic Integrity Statement: All work on this exam is my own. I had no prior knowledge of the exam contents, nor will I share the contents with others in CSE 351 who haven’t taken it yet. Violation of these terms may result in a failing grade. ( please sign )
Do not turn the page until 2:30 pm.
Instructions
Advice
Question 1 2 3 4 5 6 Total
Points Possible 25 6 41 10 9 24 115
Points Earned 25 6 41 10 9 24 115
Question 1: Warm-up [ 25 pts.]
True or False [10 pts.] Answer the following questions by circling True or False. No explanation is needed.
(1) The ISA specifies the names of all registers but does not specify the size of registers.
True / False
(2) Two’s Complement is the only valid signed integer representation that could be used when implementing a processor.
True / False
(3) Assume procedure P calls procedure Q and P stores a value in register %rbp prior to calling Q. After Q returns control to P, P can safely use the register %rbp.
True / False
(4) When programming in C, the compiler determines whether local variables are allocated on the stack or stored in registers.
True / False
(5) Your friend is designing a new processor that implements the x86-64 ISA for a new startup. Unfortunately, they aren’t a very good at their job, and their processor has a cycle time (clock cycle) of 24 hours. That is, it takes 24 hours to execute a single instruction (assume that all instructions take only a single cycle to execute). Your friend’s slow processor may still be a valid implementation of x86- 64.
True / False
(6) In C, it is safe (there is no possibility of losing data) when casting from type int (4-byte integer) to type float (4-byte IEEE floating point).
True / False
(7) In C, it is safe (there is no possibility of losing data) when casting from type int (4-byte integer) to type double (8-byte IEEE floating point).
True / False
(8) In Java, all non-primitive variables are references to objects.
True / False
(9) In the context of memory allocators, and compared to an implicit free list, an allocator using an explicit free list is much faster at allocating blocks when most of the memory is free (unused).
True / False -- faster when memory is full (explicit free list size shrinks as memory is allocated)
(10) You learned at least one new thing in 351 this quarter.
True / False
(15) What are the three types of Cache Misses? [3 pts.]
(16) Assume we are executing code on a machine that uses k - bit virtual addresses, and each addressable memory location stores b - bytes. What is the total size of the addressable virtual memory space on this machine? [2 pts.]
Question 2 : Number Representation [ 6 pts.] In computer graphics, a pixel’s color is determined by a combination of the colors red (R), green (G), and blue (B). Let’s assume that we have a new display where each pixel will display either red, greed, or blue. A pixel will only display a single color at a time, and it must be one of R, G, or B, so we can use pixels to encode values in base 3! Assume we use the encoding 0↔R, 1↔G, 2↔B. For example, 6 = 2( 31 ) + 0( 30 ) would be encoded as BR.
(A) What is the unsigned decimal value of the set of pixels displaying GGRB? [2 pts.]
(B) If we have 7 bits of binary data that we want to store as a set of pixels, how many pixels would it take to store that same data. [2 pts.]
7 bits can represent 128 things. Powers of 3: 1, 3, 9, 27, 81, 243. So, we need 5 pixels, which can represent up to 242 things.
(C) Assume we can perform the left shift operation on a set of pixels. Similar to in binary, we will shift in the 0 value, or equivalently a Red pixel. For example, our pixels from part (A) left-shifted by 1 would become GGRBR , or GGRB << 1 = GGRBR. What arithmetic operation occurs when left shifting by 1? [2 pts.]
Compulsory Capacity Conflict
(2k) * b
5 pixels
multiply by 3
Question 3 : Caching, Address Translation, and Virtual Memory [ 41 pts.] This question is a multi-part question that deals with several memory related topics we have studied. Each part is designed to be independent from the others.
Part I. Caching [1 4 pts.] Assume we are using a computer system with a physically addressed data cache. Physical addresses are 7 bits in length. The data cache has a total capacity of 64 bytes, is 2-way set associative, and the cache block size is 4 bytes. The cache uses LRU replacement, write-back, and write-allocate policies. Assume the state of the cache is as shown below (dirty bit omitted due to space). Assume our system is Little Endian and stores multi-byte data in little endian form (like x86-64 systems).
Index Tag Valid B0 B1 B2 B3 Tag Valid B0 B1 B2 B 0 - 0 - - - - 0 1 FB 17 47 E 1 1 1 D9 C6 07 01 2 1 D9 C6 07 01 2 1 1 FA 34 8C 14 - 0 - - - - (^3) 1 1 8D 76 26 5F 0 1 1B BB CB 34 4 2 0 2F F1 6A E8 3 1 2F F1 6A E 5 - 0 - - - - 3 1 76 C6 B2 D 6 - 0 - - - - - 0 - - - - 7 3 0 F5 CA 08 16 2 1 35 89 85 4B
(A) How many bits are used for the cache index, offset, and tag fields? [3 pts.]
Tag bits Index bits Offset bits 2 3 2
(B) How many management bits (bits other than the block data) are there in every line in the data cache? [1 pt.]
Management bits 4 (Tag bits + valid + dirty)
(C) For each of the following cache accesses, provide the cache tag, index, and offset (in hex). Then, determine if the request results in a cache hit (mark “Y” or “N”). If a cache hit occurs, provide the data returned from the cache for the access size requested (given in bytes) in hex. [10 pts.]
Physical Address
Request Size (Bytes) Tag^ Index^ Offset^ Hit (Y/N)^
Data Returned 0x24 2 0x1 0x1 0x0 Y 0xC6D
0x51 3 0x 2 0x 4 0x1 N n/a
0x51 = 0b 10 100 01, 0x24 = 0b 01 001 00
Part III. Cache and TLB Performance [ 5 pts.] Assume we execute the C code given below on the computer system we have described in Parts I and II. However, assume that both the Cache and TLB start cold (empty). Assume that the array of structs named structArr is allocated at virtual address 16, or 0x10, and virtual pages are mapped to physical pages in a linear manner, starting with physical page 0 (PPN = 0).
typedef struct { byte b; short s; } myStruct;
#define N 16
int main(int argc, char **argv) { // Assume structArr is located at address 16 (0x10) myStruct structArr = (myStruct) malloc(N * sizeof(myStruct)); // code to initialize array elements omitted short sum = 0; for (int i = 0; i < N; i++) { sum += structArr[i].s; } }
(A) Compute both the Cache and TLB miss rates for the accesses to structArr in the code above. Remember, the cache and TLB start cold. Express your answers as a percentage. [2 pts.]
Cache Miss Rate TLB Miss Rate
100% (one struct per cache line) 25% (a page holds 4 structs)
(B) What would happen to the TLB miss rate if the page size was doubled? Circle your answer. [1 pt.] a. Increase b. Decrease c. No Change
(C) What would happen to the Cache miss rate if the capacity of the cache was doubled while keeping the associativity and block size fixed? Circle your answer. [1 pt.] a. Increase b. Decrease c. No Change
(D) This question is independent from the previous questions in Part III. Compute the Average Memory Access Time (AMAT) assuming it takes 100 ns to get a block of data from main memory, the data cache has a hit time of 2 ns, and the data cache miss rate is 2%. Don’t forget to use the correct units for your answer! [1 pt.] 4 ns AMAT = HT + MRMP = 2 + 0.02
Question 4. Pointers and Memory [10 pts.] For this problem we are using a 64-bit x86-64 machine ( little endian ). Below is the recursive function rfun from the midterm exam, showing where the code is stored in memory.
00000000004005e6
(A) What are the values (in hex) stored in each register shown after the following x86- 64 instructions are executed? Remember to use the appropriate bit widths. [6 pts.]
Register Value (in hex)
%rax 0x 0000 0000 00 40 0 5e %rsi 0x 0000 0000 0000 0 005
movb (%rax), %cl %cl 0x0f leaq 8 (%rax, %rsi, 2), %rcx %rcx 0x 0000 0000 0040 05f
movswl (%rax, %rsi), $r8d %r8d 0x 0000 1374
(B) Complete the C code below to fulfill the behaviors described in the inline comments using pointer arithmetic. Let char cp = 0x4005ee*. [4 pts.]
char v1 = (cp + __ 2 ___); // set v1 = 0xbe int v2 = (int)((____short____)cp – 8 ); // set v2 = 0x4005de
The only 0xbe byte in rfun is found at address 0x4005f0, 2 bytes beyond cp.
The difference between v2 and cp is 16 bytes. Since by pointer arithmetic we are moving 8 “things” away, cp must be cast to a pointer to a data type of size 2 bytes, such as short.
(C) List three possible outputs of the following block of code. Write your answers into the right hand column of the table below. Note: there are more than three possible outputs; any three will suffice. [ 3 pts.]
int main() { int x = 1; printf(“%d ”, x); if (fork() != 0) { x = x << 2; printf(“%d ”, x); fork(); x = x << 1 printf(“%d ”, x); } else { x = x << 4 printf(“%d ”, x); } exit(0); }
Write your three answers in this box:
The original process prints 1 – 4 – 8 and the second child process prints 8 either before or after the original process prints 8. So those two processes always produce the output sequence 1
(D) In the following blanks, write “Y” for yes or “N” for no if the following need to be updated when execv is run on a process. [2 pts.]
Page Table ____ Y ____ PTBR ___ N _____ Stack ___ Y _____ Code ___ Y _____
The process already has a page table, so the PTBR does not need to be updated, but the old PTEs need to be invalidated. We replace/update the old process image’s virtual address space, including Stack and Code.
Question 6 : Procedures & The Stack [24 pts.] Consider the following x86-64 assembly and C code for the recursive function rfun.
// Recursive function rfun long rfun(char s) { if (s) { long temp = (long)*s; s++; return temp + rfun(s); } return 0; }
// Main Function - program entry int main(int argc, char **argv) { char *s = "Yay!"; long r = rfun(s); printf("r: %ld\n", r); }
00000000004005e6
(F) Assume main calls rfun with char *s = "Yay!", as shown in the C code. After main calls rfun, we find that the return address to main is stored on the stack at address 0x7fffffffdb38. On the first call to rfun, the register %rdi holds the address 0x4006d0, which is the address of the input string "Yay!" (i.e. char *s == 0x4006d0). Assume we stop execution prior to executing the movsbq instruction (address 0x4005ee) during the fourth call to rfun. [14 pts.]
For each address in the stack diagram below, fill in both the value and a description of the entry.
The value field should be a hex value, an expression involving the C code listed above (e.g., a variable name such as s or r , or an expression involving one of these), a literal value (integer constant, a string, a character, etc.), “unknown” if the value cannot be determined, or “unused” if the location is unused.
The description field should be one of the following: “Return address”, “Saved %reg” (where reg is the name of a register), a short and descriptive comment, “unused” if the location is unused, or “unknown” if the value is unknown.
Memory Address Value Description
0x7fffffffdb48 unknown %rsp when main is entered
0x7fffffffdb38 0x400616 Return address to main
0x7fffffffdb30 unknown original %rbx
0x7fffffffdb28 0x4005fb Return address
0x7fffffffdb20 *s, “Y” Saved %rbx
0x7fffffffdb18 0x4005fb Return address
0x7fffffffdb10 *s, *(s+1), “a” Saved %rbx
0x7fffffffdb08 0x4005fb Return address
0x7fffffffdb0 0 *s, *(s+2), “y” Saved %rbx
This page intentionally left blank.
Instruction Condition^ Codes^ (op) s, d^ test a, b^ cmp a, b je (^) “Equal” ZF d (op) s == 0 b & a == 0 b == a jne (^) “Not equal” ~ZF d (op) s != 0 b & a != 0 b != a js (^) “Sign” (negative) SF d (op) s < 0 b & a < 0 b-a < 0 jns (^) (non-negative) ~SF d (op) s >= 0 b & a >= 0 b-a >= 0 jg (^) “Greater” ~(SF^OF) & ~ZF d (op) s > 0 b & a > 0 b > a jge (^) “Greater or equal” ~(SF^OF) d (op) s >= 0 b & a >= 0 b >= a jl (^) “Less” (SF^OF) d (op) s < 0 b & a < 0 b < a jle (^) “Less or equal” (SF^OF) | ZF d (op) s <= 0 b & a <= 0 b <= a ja (^) “Above” (unsigned >) ~CF & ~ZF d (op) s > 0U b & a < 0U b > a jb (^) “Below” (unsigned <) CF d (op) s < 0U b & a > 0U b < a
Registers C Functions
Name of “virtual” register
Name Convention
Lowest 4 bytes
Lowest 2 bytes
Lowest byte %rax (^) Return value – Caller saved %eax %ax %al %rbx (^) Callee saved %ebx %bx %bl %rcx (^) Argument #4 – Caller saved %ecx %cx %cl %rdx (^) Argument #3 – Caller saved %edx %dx %dl %rsi (^) Argument #2 – Caller saved %esi %si %sil %rdi (^) Argument #1 – Caller saved %edi %di %dil %rsp (^) Stack Pointer %esp %sp %spl %rbp (^) Callee saved %ebp %bp %bpl %r8 (^) Argument #5 – Caller saved %r8d %r8w %r8b %r9 (^) Argument #6 – Caller saved %r9d %r9w %r9b %r10 (^) Caller saved %r10d %r10w %r10b %r11 (^) Caller saved %r11d %r11w %r11b %r12 (^) Callee saved %r12d %r12w %r12b %r13 (^) Callee saved %r13d %r13w %r13b %r14 (^) Callee saved %r14d %r14w %r14b %r15 (^) Callee saved %r15d %r15w %r15b
Virtual Memory Acronyms MMU Memory Management Unit VPO Virtual Page Offset TLBT TLB Tag VA Virtual Address PPO Physical Page Offset TLBI TLB Index PA Physical Address PT Page Table CT Cache Tag VPN Virtual Page Number PTE Page Table Entry CI Cache Index PPN Physical Page Number PTBR Page Table Base Register CO Cache Offset
void* malloc( size_t size): Allocate size bytes from the heap.
void* calloc( size_t n, size_t size): Allocate n*size bytes and initialize to 0.
void free( void* ptr): Free the memory space pointed to by ptr.
size_t sizeof( type ): Returns the size of a given type (in bytes).
char* gets( char* s): Reads a line from stdin into the buffer.
pid_t fork(): Create a new child process (duplicates parent).
pid_t wait( int* status): Blocks calling process until any child process exits.
int execv( char* path, char* argv[]): Replace current process image with new image.