
Spring 2007
CS 8803 Multicore Computing
Homework 4
1. We have seen how to schedule the DAG corresponding to the standard algorithm for
multiplying two n×nmatrices in O(log n)timeusingn3processors. What is the
optimal schedule for an arbitrary number pof processors, where 1 ≤p≤n3?Whatis
the corresponding parallel complexity?
2. An item Xis stored in a specific location of the global memory of an EREW PRAM.
Show how to broadcast Xtoallthelocalmemoriesofthepprocessors of the EREW
PRAM in O(log p) time. Determine how much time it takes to perform the same
operation on the CREW PRAM.
3. Develop an optimal nonrecursive prefix-sums algorithm that is similar to the nonre-
cursive prefix-sums algorithm presented in class but that does not use the auxiliary
varia ble s Band C. The input array Ashould hold the prefix sums when the algorithm
terminates.
4. Suppose we are given a set of nelements stored in array Atogether with an array
Lsuch that L(i)∈{1,2,...,k}represents the label of element A(i), where kis a
constant. Develop an optimal O(log n) time EREW PRAM algorithm that stores all
the elements of Awith label 1 into the upper part of Awhile preserving their initial
ordering, followed by the elements labeled 2 with the same initial ordering, and so on.
5. (Segmented Prefix Sums) We are given a sequence A=(a1,a
2,...,a
n)ofelements
from a set Swith an associative operation , and a Boolean array Bof length nsuch
that b1=bn=1. Foreachi1<i
2such that bi1=bi2=1andbj=0foralli1<j <i
2,
we wish to compute the prefix sums of the subarray (ai1+1,...,a
i2)ofA. Develop an
O(log n) time algorithm to compute all the corresponding prefix sums. Your algorithm
should use O(n) operations and should run on the EREW PRAM.
6. Consider a cycle C=(v1,v
2,...,v
n) with an additional set Eof edges between the
vertices of Csuch that, for each vertex vi, there exists at most one edge in Eincident
on vi. Consider the problem of determining whether or not it is possible to draw all the
edges in Einside the cycle Cwithout any two of them crossing. Develop an O(log n)
time algorithm to solve this problem. The total number of operations used must be
O(n). Your algorithm should run on the EREW PRAM model.