Problem Set#4 - Architecture of Parallel Computers | ECE 506, Assignments of Electrical and Electronics Engineering

Material Type: Assignment; Professor: Gehringer; Class: Architecture Of Parallel Computers; Subject: Electrical and Computer Engineering; University: North Carolina State University; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 03/10/2009

koofers-user-bpx
koofers-user-bpx 🇺🇸

9 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
–1–
CSC/ECE 506: Architecture of Parallel Computers
Problem Set 4
Due Friday, August 2, 2002
Problems 1 and 4 will be graded. There are 40 points on these problems
. Note: You must do all
the problems, even the non-graded ones
. If you do not do some of them, half as many points as
they are worth will be subtracted from your score on the graded problems.
he following question is about sequential consistency, total store ordering and
partial store ordering.
Problem 1 .
(20 points)
Consider the following code running on a two-CPU system:
P
1
P
2
1a.
A
= 1; 2a.
C
= 1;
1b.
B
= 1; 2b.
Y
=
A
;
1c.
X
=
C
; 2c.
Z
=
B
;
(a)
(10 points)
Assuming that the initial values of
A
,
B
and
C
are 0, and that the machine is
sequentially consistent, list all the possible outcomes for final values of
X
,
Y
and
Z
in the
form of triplets (
X
,
Y
,
Z
).
Note that
dependencies
1a 1b, 1b 1c, 2a 2b, and 2b 2c are implicit in the code. That is,
1a must precede 1b, etc. Each of the possible outcomes occurs iff certain additional orderings
(e.g., 1c 2a, 1b 2c) hold. For each possible outcome, list the additional orderings that are
needed to produce it and also give at least one of the total orderings of statements 1a, 1b, 1c, 2a,
2b, and 2c that would produce it.
(b)
(3 points)
Total store ordering (TSO) allows reads to bypass writes. Give an example of a value
triplet that is possible with TSO but not with sequential consistency. Describe the ordering of
events that would lead to these results.
(c)
(4 points)
Partial store ordering (PSO) allows writes to bypass previous writes. What is the
minimum number of memory fences (synchronization operations, such as membar instructions)
that would have to be inserted in order to achieve the same behavior as a sequentially consistent
machine? Show the modified code.
(d)
(3 points)
In what situation would weak ordering lead to improved performance compared with
PSO? Mention also any other assumptions you have to make about the machine.
Problem 2
(15 points)
Consider a 4 node CC-NUMA DSM machine using a memory-based
directory protocol with average network transaction time between nodes of 20 µsec. Compute
the remote memory-access time and draw a network transaction diagram for a write miss on node 1
to a remote remote memory block on node 2, which is dirty on node 3 for the following directory
protocol optimizations:
(a) Strict request-response
(b) Intervention forwarding
(c) Reply forwarding
pf2

Partial preview of the text

Download Problem Set#4 - Architecture of Parallel Computers | ECE 506 and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

CSC/ECE 506: Architecture of Parallel Computers

Problem Set 4

Due Friday, August 2, 2002

Problems 1 and 4 will be graded. There are 40 points on these problems. Note: You must do all the problems, even the non-graded ones. If you do not do some of them, half as many points as they are worth will be subtracted from your score on the graded problems.

he following question is about sequential consistency, total store ordering and partial store ordering.

Problem 1 .(20 points) Consider the following code running on a two-CPU system:

P 1 P (^2)

1a. A = 1; 2a. C = 1; 1b. B = 1; 2b. Y =A ; 1c. X = C ; 2c. Z = B ;

(a)(10 points) Assuming that the initial values ofA,B andC are 0, and that the machine is sequentially consistent, list all the possible outcomes for final values ofX,Y andZ in the form of triplets (X,Y,Z).

Note thatdependencies 1a → 1b, 1b → 1c, 2a → 2b, and 2b → 2c are implicit in the code. That is, 1a must precede 1b, etc. Each of the possible outcomes occurs iff certain additional orderings (e.g., 1c → 2a, 1b → 2c) hold. For each possible outcome, list the additional orderings that are needed to produce it and also give at least one of the total orderings of statements 1a, 1b, 1c, 2a, 2b, and 2c that would produce it.

(b)(3 points) Total store ordering (TSO) allows reads to bypass writes. Give an example of a value triplet that is possible with TSO but not with sequential consistency. Describe the ordering of events that would lead to these results.

(c) (4 points) Partial store ordering (PSO) allows writes to bypass previous writes. What is the minimum number of memory fences (synchronization operations, such as membar instructions) that would have to be inserted in order to achieve the same behavior as a sequentially consistent machine? Show the modified code.

(d)(3 points) In what situation would weak ordering lead to improved performance compared with PSO? Mention also any other assumptions you have to make about the machine.

Problem 2 (15 points) Consider a 4 node CC-NUMA DSM machine using a memory-based directory protocol with average network transaction time between nodes of 20 μsec. Compute the remote memory-access time and draw a network transaction diagram for a write miss on node 1 to a remote remote memory block on node 2, which is dirty on node 3 for the following directory protocol optimizations:

(a) Strict request-response

(b) Intervention forwarding

(c) Reply forwarding

Problem 3. (20 points) Calculate the number of physical “wires” needed between nodes for a 4096-node system of each of the following network types. For each network,

(i) Give a general expression for the number of wires needed to connect that type of network. Assume one “wire” for a connection between two nodes. (ii) Give the diameter of the network. (iii) Give the average distance between two nodes in this network.

(a) A hypercube network.

(b) A barrel-shifter network.

Problem 4. (20 points) An omega network is constructed out of a number of 2 × 2 switching cells. Assume that both inputs and outputs are eight bits wide, and that the bits are available in both normal and complement form.

(a) Diagram the gate-level implementation of a switching cell, assuring that only two-input NAND gates are available. Use as few NAND gates as possible. Indicate input, output, and control lines.

(b) What is the minimum number of 2-input NAND gates that are needed to build an omega network withN = 2n^ eight-bit inputs and outputs?

Problem 5. (25 points) Prove that the formula for aq-shuffle ofq c items is correct:

S (^) q × c (i) = (q i +  i /c ) modq c.

That is, show that whenq c “cards” are evenly divided intoq “piles” ofc items each, the item that begins in theith position will move to theSq × c (i)th position.