Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Sample Midterm Problems by Topic - Computer Systems | CS 3214, Exams of Computer Science

Virginia Polytechnic Institute and State University (Virginia Tech)Computer Science

Prof. Godmar Volker Back

Material Type: Exam; Professor: Back; Class: Computer Systems; Subject: Computer Science; University: Virginia Polytechnic Institute And State University; Term: Fall 2009;

Typology: Exams

Pre 2010

Uploaded on 10/26/2010

chadski 🇺🇸

3 documents

1 / 11

This page cannot be seen from the preview

Don't miss anything!

CS 3214 Sample Midterm (Fall 2009)

1/11

Sample Midterm (Fall 2009)

Solutions are shown in this style. This exam was given in Fall 2009.

1. Executing Programs on IA32 (20 pts)

The following questions relate to how programs are compiled for IA32.

a) (8 pts) In lecture, we had discussed how each function obtains its own

activation record, or stack frame, every time it is called. The stack frame is

used for several purposes, including to hold the values of arguments

passed to a function or to hold the values of local variables that cannot be

kept in registers. Typically, accesses to these arguments and variables

involve loads or stores that use relative addressing using the %ebp

register as a base.

Recent versions of gcc support an optimization option ‘-fomit-frame-

pointer’ that organizes accesses to local variables differently. Instead of

using the base/frame pointer register %ebp, the stack pointer register

%esp is used to access local variables and arguments passed to a

function. As a result, %ebp is available for other uses.

i. (6 pts) Explain why and how this would work!

Why is the base pointer, apparently, redundant?

The base pointer is always at known offset from the stack pointer, so any

accesses that use addressing relative to $ebp can be replaced with accesses

that use addressing relative to $esp.

ii. (2 pts) Consider the example of accessing the first argument, which is

traditionally accessed using 8(%ebp).

How would code compiled with –fomit-frame-pointer access this

argument?

Let SFSIZE = |$ebp-$esp| be the current stack frame size, then any access to

disp($ebp) can be replaced with disp+SFSIZE($esp). For example, 8($ebp)

would become 8+SFSIZE($esp). For example:

int sum(int x, int y)

{

char localarray[16];

return x + y;

}

When compiled with –fomit-frame-pointer (but without other optimizations), the

code shows:

Discover Exams of Computer Science Virginia Polytechnic Institute and State University (Virginia Tech)

Partial preview of the text

Download Sample Midterm Problems by Topic - Computer Systems | CS 3214 and more Exams Computer Science in PDF only on Docsity!

Sample Midterm (Fall 2009)

Solutions are shown in this style. This exam was given in Fall 2009.

1. Executing Programs on IA32 (20 pts)

The following questions relate to how programs are compiled for IA32.

a) (8 pts) In lecture, we had discussed how each function obtains its own activation record, or stack frame, every time it is called. The stack frame is used for several purposes, including to hold the values of arguments passed to a function or to hold the values of local variables that cannot be kept in registers. Typically, accesses to these arguments and variables involve loads or stores that use relative addressing using the %ebp register as a base. Recent versions of gcc support an optimization option ‘-fomit-frame- pointer’ that organizes accesses to local variables differently. Instead of using the base/frame pointer register %ebp, the stack pointer register %esp is used to access local variables and arguments passed to a function. As a result, %ebp is available for other uses.

i. (6 pts) Explain why and how this would work! Why is the base pointer, apparently, redundant?

The base pointer is always at known offset from the stack pointer, so any accesses that use addressing relative to $ebp can be replaced with accesses that use addressing relative to $esp.

ii. (2 pts) Consider the example of accessing the first argument, which is traditionally accessed using 8(%ebp). How would code compiled with –fomit-frame-pointer access this argument?

Let SFSIZE = |$ebp-$esp| be the current stack frame size, then any access to disp($ebp) can be replaced with disp+SFSIZE($esp). For example, 8($ebp) would become 8+SFSIZE($esp). For example:

int sum(int x, int y) { char localarray[16]; return x + y; }

When compiled with –fomit-frame-pointer (but without other optimizations), the code shows:

sum: subl $16, %esp movl 24(%esp), %eax addl 20(%esp), %eax addl $16, %esp ret

b) (12 pts) Consider the following assembly code, which was produced by gcc for a function ‘g()’. The left column shows the result when compiling at the first level of optimization (-O1), the right column shows the result of compiling at the second optimization level. IA 32 Code,compiled with –O1 IA 32 Code, compiled with –O g: pushl %ebp movl %esp, %ebp subl $8, %esp movl 8(%ebp), %eax movl 12(%ebp), %edx cmpl %edx, %eax je .L cmpl $1, %eax je .L cmpl $1, %edx jne .L .L6: movl $1, %eax jmp .L .L4: cmpl %edx, %eax jge .L movl %eax, 4(%esp) subl %eax, %edx movl %edx, (%esp) call g jmp .L .L7: movl %edx, 4(%esp) subl %edx, %eax movl %eax, (%esp) call g .L2: leave ret

g: pushl %ebp movl %esp, %ebp movl 8(%ebp), %edx movl 12(%ebp), %ecx cmpl %ecx, %edx je .L .L15: cmpl $1, %edx je .L cmpl $1, %ecx je .L cmpl %ecx, %edx jge .L movl %ecx, %eax movl %edx, %ecx subl %edx, %eax movl %eax, %edx .L7: cmpl %edx, %ecx jne .L .L3: popl %ebp movl %ecx, %eax ret .L5: popl %ebp movl $1, %eax ret .L9: subl %ecx, %edx jmp .L i. (9 pts) Provide a C version of function g()! Hint: ‘g’ implements a well-known, classic mathematical algorithm!

‘g’ implements Euclid’s algorithm for finding the greatest common divisor:

int g(int m, int n) {

You repeat the compilation and, in fact, the error goes away:

$ gcc -c a.c b.c main.c $ gcc a.o b.o main.o $

(2 pts) Why did the linker not report an error this time?

As an uninitialized global variable, global_shared_variable becomes a weak symbol, hence the linker will not report an error for multiple definitions.

Your teammate proposes to fix the error in a different way, by making the variable static, i.e., by changing shared.h to read:

static int global_shared_variable = -1;

You repeat the compilation and, in fact, the error is gone:

$ gcc -c a.c b.c main.c $ gcc a.o b.o main.o $

(2 pts) Why did the linker not report an error this time?

global_shared_variable has become 2 distinct local symbols in a.o and b.o that happen to have the same name, hence there is no conflict for the linker to report.

(2 pts) Explain why this solution is not a good one!

It would create 2 copies of this variable with distinct memory locations holding potentially different values, updates to one would not affect the other. This is likely not what the programmer intended when placing the definition into shared.h.

(4 pts) Complete the following table to show the correct way to address the issue in a way that avoids linker errors and allows the variable to be initialized!

shared.h

extern int global_shared_variable;

a.c b.c main.c

#include “shared.h”

int global_shared_variable = -1;

#include “shared.h” int main() { }

Alternatively, the definition could be contained in b.c or main.c It’s also possible to omit the ‘extern’, in which case the linker rule applies that a single strong definition in a.o overrides the weak definition in b.o. However, this is not good practice (-Wl,--warn-common would flag it). Some suggested placing ‘extern int global_shared_variable’ in b.c – this would compile and link, but is generally not considered sound programming practice.

b) (4 pts) A “fence” is a technique that is sometimes used to detect out-of- bounds memory accesses. The idea is to place some ‘fence’ values that rarely occur during normal execution before and after each array. Then, out-of-bounds accesses can be detected by checking whether the fence values were changed. Complete the program below to implement this idea to protect array ‘a’ which is passed to a buggy update routine that contains out-of-bounds accesses.

void buggy_update_array(int *array, int n, int delta) { int i; for (i = 0; i <= n; i++) { array[i-1] = array[i] + delta; } }

int a[10];

int main() { buggy_update_array(a, 10, 1); }

Knowing that the linker will allocate variables of the same storage class consecutively in memory, the program can be completed as follows:

void buggy_update_array(int *array, int n, int delta) { int i; for (i = 0; i <= n; i++) { array[i-1] = array[i] + delta; } }

int leftfence; int a[10]; int rightfence;

int main() { #define MAGIC 0xdeadbeef; leftfence = rightfence = MAGIC;

i. (2 pts) With respect to code

Yes, it uses loops whose instructions are executed many times.

ii. (3 pts) With respect to data

No, each matrix element is accessed once and only once. (Though less relevant, I also accepted ‘yes’ if you pointed out that there is reuse of ‘i' and ‘j’ – but not ‘tmp’)

b) (5 pts) Does this algorithm exhibit spatial locality? Briefly say why or why not!

i. (2 pts) With respect to code

Yes – the executed code is contained in a contiguous section of instructions. (The fact that each backward branch in the loop causes a non-contiguous control transfer notwithstanding.)

ii. (3 pts) With respect to data

If the matrix is stored in row-major order, as in C, the accesses to matrix[i][j] exhibit spatial locality, but the accesses to matrix[j][i] do not.

c) (5 pts) Assume a memory hierarchy with just one level of caching and a cache line size of 64 bytes, which can hold 16 ints. How many cache misses would you expect per inner loop iteration?

Each loop iteration accesses both matrix[i][j] and matrix[j][i]. If the matrix is large enough (so that the distance between &matrix[k][x] and &matrix[k+1][x] is large), matrix[j][i] would miss every time, and matrix[i][j] every 16 th^ time – once per cache line - thus we would expect 1+1/16=1.0625 cache misses per iteration.

d) (5 pts) In lecture we had discussed blocking as a method to speed up dense matrix multiplication. Could blocking be applied to speed up in- place matrix transposition? Briefly justify your answer!

Yes. Divide the matrix into small squares that fit in the cache, transpose the elements in each square block using a temporary buffer. If the temporary buffer, the source block, and the destination block fit into the cache, there will be no penalty for the lack of spatial locality because the cache block fetched when accessing the b[k][] will still be in the cache when b[k+1][] is accessed. This description is simplified: in practice, one needs to worry about conflict misses as well. This blocking avoids the cache misses due to lack of spatial locality when accessing neighboring columns; it does not introduce temporal locality.

4. Optimizations (18 pts)

a) (4 pts) Consider the following C code

void matrix_vector_multiply(int * y, int M[2][2], int * x) { y[0] = M[0][0] * x[0] + M[0][1] * x[1]; y[1] = M[1][0] * x[0] + M[1][1] * x[1]; }

Suppose you have an infinitely sophisticated compiler and you are using a machine with plenty of registers such as x86_64. How many memory load instructions and how many memory store instructions would the body of this function contain? (Not counting any accesses needed for stack frame management or saving callee-saved registers.)

Because x and y could refer to the same vector, we need 8 loads and 2 stores. Loads are for M[0][0], M[0][1], x[0], x[1], M[1][0], M[1][1], x[0], and x[1], stores for y[0] and y[1]. For example, here is the x86_64 code:

matrix_vector_multiply: movl 4(%rdx), %ecx # load x[1] movl (%rdx), %eax # load x[0] imull 4(%rsi), %ecx # load M[0][1] imull (%rsi), %eax # load M[0][0] addl %eax, %ecx movl %ecx, (%rdi) # store y[0] movl 4(%rdx), %ecx # load x[1] movl (%rdx), %eax # load x[0] imull 12(%rsi), %ecx # load M[1][1] imull 8(%rsi), %eax # load M[1][0] addl %eax, %ecx movl %ecx, 4(%rdi) # store y[1] ret

b) (4 pts) Now consider this C function, which is almost identical to the one above, except that the matrix M is no longer a nested array:

void matrix_vector_multiply2(int * y, int *M[], int * x) { y[0] = M[0][0] * x[0] + M[0][1] * x[1]; y[1] = M[1][0] * x[0] + M[1][1] * x[1]; }

How many memory load and store instructions would a compiler emit for this function?

imull 4(%rsi), %eax # load M[0][1] imull (%rsi), %edx # load M[0][0] addl %edx, %eax movl %eax, (%rdi) # store y[0] imull 12(%rsi), %r8d # load M[1][1] imull 8(%rsi), %ecx # load M[1][0] addl %ecx, %r8d movl %r8d, 4(%rdi) # store y[1] ret

d) (6 pts) Assuming an optimizing compiler, is there always a performance cost for declaring many local variables within one function? If yes, say why. If not, explain precisely when there is a cost and when there isn’t!

No, not always. Optimizing compilers perform register allocation. The number of declared local variables does not matter unless the lifetime of these local variables overlaps. Each register can hold only one local variable at a time, if there are more local variables alive at any point in a function than there are registers, spilling occurs and a performance penalty is paid.

5. Unix Process Management (20pts)

a) (14 pts) Consider the following example programs. List all legal outputs this program may produce when executed on a Unix system. The output consists of strings made up of multiple letters.

// included in both programs #include <unistd.h> #include <sys/wait.h> // W(A) means write(1, “A”, sizeof “A”) #define W(x) write(1, #x, sizeof #x)

Possible Outputs:

int main() { W(A); fork(); W(B); fork(); W(C); }

i) 6 pts

Possible outputs are: ABBCCCC ABCBCCC ABCCBCC

int main() { W(A); int child = fork(); W(B); if (child) wait(NULL);

ii) 8 pts

Possible outputs are: ABCBC ABBCC

W(C); }

There is a bug in the program: it should be write(1, #x, sizeof #x – 1). The program as is outputs a ‘\0’ character, which however does not appear on the terminal.

b) (6 pts) Consider the following two programs. Below each program is shown the output sent to the terminal when the program is run:

int main() { if (fork()) *(int *)0 = 42; }

int main() { if (!fork()) *(int *)0 = 42; } Output: $ ./crash Segmentation fault $

Output: $ ./crash $

Why is the message “Segmentation fault” displayed for the program on the left, but not for the program on the right?

The segmentation fault message is displayed by the shell if a child process is terminated with signal 11, SIGSEGV. On the left, where fork() returns not zero, the shell’s child is terminated. On the right, the process that is terminated is the child process, which is a grandchild of the shell. ‘wait()’ does not allow the shell to wait for grandchildren, hence the shell cannot learn that the process terminated with a fault, hence no message. Note that this behavior occurs independent of whether the scheduler runs the parent or the child first after the fork (on a single processor system).

Sample Midterm Problems by Topic - Computer Systems | CS 3214, Exams of Computer Science

Related documents

Partial preview of the text

Download Sample Midterm Problems by Topic - Computer Systems | CS 3214 and more Exams Computer Science in PDF only on Docsity!

Sample Midterm (Fall 2009)

1. Executing Programs on IA32 (20 pts)

4. Optimizations (18 pts)

5. Unix Process Management (20pts)