

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Assignment; Professor: Gehringer; Class: Architecture Of Parallel Computers; Subject: Electrical and Computer Engineering; University: North Carolina State University; Term: Unknown 1989;
Typology: Assignments
1 / 2
This page cannot be seen from the preview
Don't miss anything!


Problems 1, 3, and 4 will be graded. There are 60 points on these problems. Note: You must do all the problems, even the non-graded ones. If you do not do some of them, half as many points as they are worth will be subtracted from your score on the graded problems.
Problem 1. (20 points) Assume a two-processor system executing the following code on CPUs P1 andP2 respectively.
1a i = 1 2a k = 1
1b j = 1 2b Y =i
1c X =k 2c Z =j
Suppose that the initial value of every variable is 0.
(a) Assuming that the memory is sequentially consistent (see Lecture 13), list all possible (X, Y, Z) triples at the end of execution of the above code by the respective processors on the two- processor machine.
(b) Total store ordering (TSO) allows reads to bypass writes. What output triples (X,Y,Z) are possible with TSO but not with sequential consistency? Explain briefly why these triples are possible with TSO but not with Sequential Consistency for the above code.
(c) What is the minimum number ofmembar instructions (memory barrier, or fence; see VBEE Lecture 21, on-campus Lecture 24) that need to be inserted into the TSO model for the above code in order to guarantee sequential consistency? Show the resulting code.
(d) Partial store ordering (PSO, see Culler, Singh, and Gupta; p.689) not only allows reads to bypass writes, but also allows writes to bypass writes. According to the original code (without the membar instructions), what output triples (X,Y,Z ) are possible with PSO but not with sequential consistency? Explain why.
Problem 2 (15 points) Consider a 4 node CC-NUMA DSM machine using a memory-based directory protocol with average network transaction time between nodes of 20 μsec. Compute the remote memory-access time and draw a network transaction diagram for a write miss on node 1 to a remote remote memory block on node 2, which is dirty on node 3 for the following directory protocol optimizations:
(a) Strict request-response
(b) Intervention forwarding
(c) Reply forwarding
Problem 3. (20 points) Calculate the number of physical “wires” needed between nodes for a 4096-node system of each of the following network types. For each network,
(i) Give a general expression for the number of wires needed to connect that type of network. Assume one “wire” for a connection between two nodes. (ii) Give the diameter of the network. (iii) Give the average distance between two nodes in this network.
(a) A hypercube network.
(b) A barrel-shifter network.
Problem 4. (20 points) (a) [CS&G 7.1] A radix-2 FFT overn complex numbers is implemented as a sequence of logn completely parallel steps, requiring 5n logn floating-point operations while reading and writing each element of data logn times. Calculate the communication-to- computation ratio on a dance-hall design where all processors access memory through the network, as in Figure 7.3 from CS&G. What communication bandwidth (in terms of number of number of complex-numbers per second per processor) would the network need to sustain for the machine to deliver 250 MFLOPS per processor on ap-processor machine?
(b) [CS&G 7.6] Consider the above machine, where the number of links occupied by each transfer is logn. In the absence of contention for individual links, how many transfers can occur simultaneously?
Problem 5. (25 points) Prove that the formula for aq-shuffle ofq c items is correct:
S (^) q × c (i) = (q i + i /c ) modq c.
That is, show that whenq c “cards” are evenly divided intoq “piles” ofc items each, the item that begins in theith position will move to theSq × c (i)th position.