Architecture Of Parallel Computers - Problem Set #4 | ECE 506, Assignments of Electrical and Electronics Engineering

Material Type: Assignment; Professor: Gehringer; Class: Architecture Of Parallel Computers; Subject: Electrical and Computer Engineering; University: North Carolina State University; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 03/11/2009

koofers-user-l25-1
koofers-user-l25-1 🇺🇸

10 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
–1–
CSC/ECE 506: Architecture of Parallel Computers
Problem Set 4
Due Thursday, May 2, 2002
Problems 1, 3, and 4 will be graded. There are 60 points on these problems
. Note: You must do all
the problems, even the non-graded ones
. If you do not do some of them, half as many points as
they are worth will be subtracted from your score on the graded problems.
Problem 1.
(20 points)
Assume a two-processor system executing the following code on CPUs
P
1 and
P
2 respectively.
P
1
P
2
1a
i
= 1 2a
k
= 1
1b
j
= 1 2b
Y
=
i
1c
X
=
k
2c
Z
=
j
Suppose that the initial value of every variable is 0.
(a) Assuming that the memory is sequentially consistent (see Lecture 13), list all possible (X, Y, Z)
triples at the end of execution of the above code by the respective processors on the two-
processor machine.
(b) Total store ordering (TSO) allows reads to bypass writes. What output triples (
X
,
Y
,
Z
) are
possible with TSO but not with sequential consistency? Explain briefly why these triples are
possible with TSO but not with Sequential Consistency for the above code.
(c) What is the minimum number of
membar
instructions (memory barrier, or fence; see VBEE
Lecture 21, on-campus Lecture 24) that need to be inserted into the TSO model for the above
code in order to guarantee sequential consistency? Show the resulting code.
(d) Partial store ordering (PSO, see Culler, Singh, and Gupta; p.689) not only allows reads to
bypass writes, but also allows writes to bypass writes. According to the original code (without the
membar instructions), what output triples (
X
,
Y
,
Z
) are possible with PSO but not with sequential
consistency? Explain why.
Problem 2
(15 points)
Consider a 4 node CC-NUMA DSM machine using a memory-based
directory protocol with average network transaction time between nodes of 20 µsec. Compute
the remote memory-access time and draw a network transaction diagram for a write miss on node 1
to a remote remote memory block on node 2, which is dirty on node 3 for the following directory
protocol optimizations:
(a) Strict request-response
(b) Intervention forwarding
(c) Reply forwarding
pf2

Partial preview of the text

Download Architecture Of Parallel Computers - Problem Set #4 | ECE 506 and more Assignments Electrical and Electronics Engineering in PDF only on Docsity!

CSC/ECE 506: Architecture of Parallel Computers

Problem Set 4

Due Thursday, May 2, 2002

Problems 1, 3, and 4 will be graded. There are 60 points on these problems. Note: You must do all the problems, even the non-graded ones. If you do not do some of them, half as many points as they are worth will be subtracted from your score on the graded problems.

Problem 1. (20 points) Assume a two-processor system executing the following code on CPUs P1 andP2 respectively.

P1 P

1a i = 1 2a k = 1

1b j = 1 2b Y =i

1c X =k 2c Z =j

Suppose that the initial value of every variable is 0.

(a) Assuming that the memory is sequentially consistent (see Lecture 13), list all possible (X, Y, Z) triples at the end of execution of the above code by the respective processors on the two- processor machine.

(b) Total store ordering (TSO) allows reads to bypass writes. What output triples (X,Y,Z) are possible with TSO but not with sequential consistency? Explain briefly why these triples are possible with TSO but not with Sequential Consistency for the above code.

(c) What is the minimum number ofmembar instructions (memory barrier, or fence; see VBEE Lecture 21, on-campus Lecture 24) that need to be inserted into the TSO model for the above code in order to guarantee sequential consistency? Show the resulting code.

(d) Partial store ordering (PSO, see Culler, Singh, and Gupta; p.689) not only allows reads to bypass writes, but also allows writes to bypass writes. According to the original code (without the membar instructions), what output triples (X,Y,Z ) are possible with PSO but not with sequential consistency? Explain why.

Problem 2 (15 points) Consider a 4 node CC-NUMA DSM machine using a memory-based directory protocol with average network transaction time between nodes of 20 μsec. Compute the remote memory-access time and draw a network transaction diagram for a write miss on node 1 to a remote remote memory block on node 2, which is dirty on node 3 for the following directory protocol optimizations:

(a) Strict request-response

(b) Intervention forwarding

(c) Reply forwarding

Problem 3. (20 points) Calculate the number of physical “wires” needed between nodes for a 4096-node system of each of the following network types. For each network,

(i) Give a general expression for the number of wires needed to connect that type of network. Assume one “wire” for a connection between two nodes. (ii) Give the diameter of the network. (iii) Give the average distance between two nodes in this network.

(a) A hypercube network.

(b) A barrel-shifter network.

Problem 4. (20 points) (a) [CS&G 7.1] A radix-2 FFT overn complex numbers is implemented as a sequence of logn completely parallel steps, requiring 5n logn floating-point operations while reading and writing each element of data logn times. Calculate the communication-to- computation ratio on a dance-hall design where all processors access memory through the network, as in Figure 7.3 from CS&G. What communication bandwidth (in terms of number of number of complex-numbers per second per processor) would the network need to sustain for the machine to deliver 250 MFLOPS per processor on ap-processor machine?

(b) [CS&G 7.6] Consider the above machine, where the number of links occupied by each transfer is logn. In the absence of contention for individual links, how many transfers can occur simultaneously?

Problem 5. (25 points) Prove that the formula for aq-shuffle ofq c items is correct:

S (^) q × c (i) = (q i +  i /c ) modq c.

That is, show that whenq c “cards” are evenly divided intoq “piles” ofc items each, the item that begins in theith position will move to theSq × c (i)th position.