

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Assignment; Professor: Gehringer; Class: Architecture Of Parallel Computers; Subject: Electrical and Computer Engineering; University: North Carolina State University; Term: Unknown 1989;
Typology: Assignments
1 / 2
This page cannot be seen from the preview
Don't miss anything!


Problems 1 and 4 will be graded. There are 40 points on these problems. Note: You must do all the problems, even the non-graded ones. If you do not do some of them, half as many points as they are worth will be subtracted from your score on the graded problems.
he following question is about sequential consistency, total store ordering and partial store ordering.
Problem 1 .(20 points) Consider the following code running on a two-CPU system:
P 1 P (^2)
1a. A = 1; 2a. C = 1; 1b. B = 1; 2b. Y =A ; 1c. X = C ; 2c. Z = B ;
(a)(10 points) Assuming that the initial values ofA,B andC are 0, and that the machine is sequentially consistent, list all the possible outcomes for final values ofX,Y andZ in the form of triplets (X,Y,Z).
Note thatdependencies 1a → 1b, 1b → 1c, 2a → 2b, and 2b → 2c are implicit in the code. That is, 1a must precede 1b, etc. Each of the possible outcomes occurs iff certain additional orderings (e.g., 1c → 2a, 1b → 2c) hold. For each possible outcome, list the additional orderings that are needed to produce it and also give at least one of the total orderings of statements 1a, 1b, 1c, 2a, 2b, and 2c that would produce it.
(b)(3 points) Total store ordering (TSO) allows reads to bypass writes. Give an example of a value triplet that is possible with TSO but not with sequential consistency. Describe the ordering of events that would lead to these results.
(c) (4 points) Partial store ordering (PSO) allows writes to bypass previous writes. What is the minimum number of memory fences (synchronization operations, such as membar instructions) that would have to be inserted in order to achieve the same behavior as a sequentially consistent machine? Show the modified code.
(d)(3 points) In what situation would weak ordering lead to improved performance compared with PSO? Mention also any other assumptions you have to make about the machine.
Problem 2 (15 points) Consider a 4 node CC-NUMA DSM machine using a memory-based directory protocol with average network transaction time between nodes of 20 μsec. Compute the remote memory-access time and draw a network transaction diagram for a write miss on node 1 to a remote remote memory block on node 2, which is dirty on node 3 for the following directory protocol optimizations:
(a) Strict request-response
(b) Intervention forwarding
(c) Reply forwarding
Problem 3. (20 points) Calculate the number of physical “wires” needed between nodes for a 4096-node system of each of the following network types. For each network,
(i) Give a general expression for the number of wires needed to connect that type of network. Assume one “wire” for a connection between two nodes. (ii) Give the diameter of the network. (iii) Give the average distance between two nodes in this network.
(a) A hypercube network.
(b) A barrel-shifter network.
Problem 4. (20 points) An omega network is constructed out of a number of 2 × 2 switching cells. Assume that both inputs and outputs are eight bits wide, and that the bits are available in both normal and complement form.
(a) Diagram the gate-level implementation of a switching cell, assuring that only two-input NAND gates are available. Use as few NAND gates as possible. Indicate input, output, and control lines.
(b) What is the minimum number of 2-input NAND gates that are needed to build an omega network withN = 2n^ eight-bit inputs and outputs?
Problem 5. (25 points) Prove that the formula for aq-shuffle ofq c items is correct:
S (^) q × c (i) = (q i + i /c ) modq c.
That is, show that whenq c “cards” are evenly divided intoq “piles” ofc items each, the item that begins in theith position will move to theSq × c (i)th position.