





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The pram model of parallel computation, focusing on the or and min problems. It covers various algorithms, including erew and crcw, and their efficiencies. The document also touches upon the parallel prefix problem and the all pairs shortest path problem.
Typology: Study notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






Parallel computation: Tightly coupled processors that can communicate al- most as quickly as perform a computation Distributed computation: Loosely couple processor for which communication is much slower than computation
A PRAM machine consists of m synchronous processors with shared memory. This model ignores synchronization problems and communication issues, and concentrates on the task of parallelization of the problem. One gets various variations of this model depending on how various processors are permitted to access the same memory location at the same time.
We define T (n, m) to be parallel time for the algorithm under considera- tion with m processors on input of size n Let S(n) be the time complexity of the best known sequential time algorithm for a particular problem. Then the efficiency of a parallel algorithm is defined by
E(n, m) = S(n) mT (n, m)
Efficiencies can range from 0 to 1. The best possible efficiency you can have is 1. Generally we prefer algorithm whose efficiency is Ω(1). The folding principle states that you can always use fewer processors and get the same efficiency.
INPUT: bits b 1 ,... bn OUTPUT: The logical or of the bits One can obtain an EREW Algorithm with T (n, n) = log n using a divide and conquer algorithm that is perhaps best understood as a binary tree. The leaves of the binary tree are the bits. Each internal node is a processor that OR’s the output of its children. The efficiency of the EREW Algorithm is
n n log n
log n
One can obtain an EREW algorithm for the OR problem with E(n, m = n/ log n) = Θ(1), by partitioning the input into (^) logn n partitions of log n bits each, and precomputing the OR of the bits in these partitions. One can also obtain a CRCW Algorithm with T (n, n) = Θ(1). In this algorithm each processor Pi sets a variable answer to 0, then if bi = 1, Pi sets answer to 1. The efficiency of this algorithm is E(n, n) = Θ(1).
See section 10.2.1, and section 10.2.2. INPUT: Integers x 1 ,... , xn OUTPUT: The smallest xi The results are essentially the same as for the OR problem. There is an EREW divide and conquer algorithm with E(n, n/ log n) = 1. Note that this technique works for any associative operator (both OR and MIN are associative). There is an CRCW algorithm with T (n, m = n^2 ) = 1 and E(n, m = n^2 ) = 1 /n. Here’s code for processor Pi,j , 1 ≤ i, j ≤ j for the CRCW algorithm to compute the location of the minimum number:
For i = 1 to n do For j=1 to n do D[i,j]=weight of edge (i, j)
Repeat log n times For i = 1 to n do For j=1 to n do For m=1 to n do D[i,j]=min{D[i,j], D[i,m]+D[m,j]}
The correctness of this procedure can be seen using the following loop invariant: After t times through the repeat loop, for all i and j, if the length of the shortest path between vertices i and j that has 2t^ or less edges is equal to D[i, j]. Note that the outer loop is definitely not parallizable. Lets try to parallize the innter 3 loops to see what happens.
Repeat log n times For all i,j,m in parallel D[i,j]=min{D[i,j], D[i,m]+D[m,j]}
But note that this would require a concurrent write machine that always writes the smallest value. So we try:
Repeat log n times For all i,j,m in parallel T[i,m, j]=min{D[i,j], D[i,m]+D[m,j]} D[i,j]=min{T[1,1,j] ... T[1,n,j]}
This runs in time T (n, n^3 ) = log^2 n on an CREW. This gives an efficiency something like E(n, m = n^3 ) = n^3 /(n^3 log^2 n) = 1/ log^2 n.
See section 10.2.1.
INPUT: Sorted lists x 1 ,... , xn, and y 1 ,... , yn. OUTPUT: The two lists merged into one sorted list z 1 ,... , z 2 n We give the following divide and conquer algorithm
Merge(x_1, ..., x_n, y_1 ... y_n)
Merge(x_1, x_3, x_5, y_2, y_4, y_6 ...) to get a_1 ... a_n Merge(x_2, x_4, x_6, y_1, y_3, y_5 ...) to get b_1 ... b_n for i=1 to n do z_2i-1=min(a_i, b_i) z_2i =max(a_i, b_i)
This can be implemented on an EREW PRAM to run in time T (n, n) = log n thus giving efficiency Θ(1/ log n). The following argument establishes the correctness of the algorithm. Each ai is greater than or equal a 1 ,... , ai. Each ai, i > 1 is larger than bi− 1. Hence, ai ≥ z 2 i− 1. Each bi is greater than or equal a 1 ,... , bi. Each bi, i > 1, is larger than ai− 1. Hence, bi ≥ z 2 i− 1. This same argument shows ai+1 ≥ z 2 i+1 and bi+1 ≥ z 2 i+1. So z 2 i− 1 and z 2 i must be ai and bi.
See section 10.2.1. We give the following divide and conquer algorithm
Sort(x_1, ... x_n,)
Merge(Sort(x_1, ... x_n/2), Sort(x_n/2, ... x_n))
This can be implemented on an EREW PRAM to run in time T (n, n) = log^2 n thus giving efficiency (^) nn loglog 2 n (^) n = Θ(1/ log n).
depth of each node in the tree. We show by example how to reduce this to pointer doubling. From the following tree
We create the list (the first line)
1, 1, -1, 1, -1, -1, 1, - A B D B E B A C A
and call pointer doubling with d[i] initialized to either 1 or -1 appropriately. The depth of a node is then computed by looking at the sum up to the point shown in the second line.
The input is an algebraic expression in the form of a binary tree, with the leaves being the elements, and the internal nodes being the algebraic oper- ations. The goal is to compute the value of the expression. Some obvious approaches won’t work are: 1) Evaluate nodes when both values of children are known, and 2) parallel prefix. The first approach won’t give you a speed up if the tree is unbalanced. The second approach won’t work if the operators are not be associative. First assume that the only operation is subtraction. We label edges by functions. We now define the cut operation. If we have a subtree that looks like
| h(x)
/ \
f(x)/ \ g(x)
and cut on the root of this subtree we get
| h(f(x) - g(c))
/
/
A B
If we have a subtree that looks like
| h(x)
/
f(x)/ \ g(x) constant c - /
A B
and cut on the root of this subtree we get
| h(f(c) - g(x))
/
/
A B
Thus we are left with finding a class of functions, with the base elements being constants, that are closed under composition, subtraction of constants, and subtraction from constants. This class is the functions of the form ax + b, for real/rational a and b.