Parallel Computation: PRAM Model and Efficiency Analysis of OR and MIN Problems, Study notes of Computer Science

The pram model of parallel computation, focusing on the or and min problems. It covers various algorithms, including erew and crcw, and their efficiencies. The document also touches upon the parallel prefix problem and the all pairs shortest path problem.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-15a
koofers-user-15a 🇺🇸

10 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1 Parallel and Distributed Computation
Parallel computation: Tightly coupled processors that can communicate al-
most as quickly as perform a computation
Distributed computation: Loosely couple processor for which communication
is much slower than computation
2 PRAM Model
APRAM machine consists of msynchronous processors with shared memory.
This model ignores synchronization problems and communication issues, and
concentrates on the task of parallelization of the problem. One gets various
variations of this model depending on how various processors are permitted
to access the same memory location at the same time.
ER= Exclusive Read, only one processor can read a location in any 1
step
CR= Concurrent Read, any number of processors can read a location
in a step
EW= Exclusive write, only one processor can write a location in any
1 step
CW= Concurrent Write, any number of processors can write a location
in a step. What it 2 processors try to write different values?
Common: All processors must be trying to write the same value
Arbitrary: An arbitrary processor succeeds in the case of a write
conflict
Priority: The lowest number processor succeeds
We define T(n, m) to be parallel time for the algorithm under considera-
tion with mprocessors on input of size nLet S(n) be the time complexity
of the best known sequential time algorithm for a particular problem. Then
the efficiency of a parallel algorithm is defined by
E(n, m) = S(n)
mT (n, m)
1
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Parallel Computation: PRAM Model and Efficiency Analysis of OR and MIN Problems and more Study notes Computer Science in PDF only on Docsity!

1 Parallel and Distributed Computation

Parallel computation: Tightly coupled processors that can communicate al- most as quickly as perform a computation Distributed computation: Loosely couple processor for which communication is much slower than computation

2 PRAM Model

A PRAM machine consists of m synchronous processors with shared memory. This model ignores synchronization problems and communication issues, and concentrates on the task of parallelization of the problem. One gets various variations of this model depending on how various processors are permitted to access the same memory location at the same time.

  • ER= Exclusive Read, only one processor can read a location in any 1 step
  • CR= Concurrent Read, any number of processors can read a location in a step
  • EW= Exclusive write, only one processor can write a location in any 1 step
  • CW= Concurrent Write, any number of processors can write a location in a step. What it 2 processors try to write different values? - Common: All processors must be trying to write the same value - Arbitrary: An arbitrary processor succeeds in the case of a write conflict - Priority: The lowest number processor succeeds

We define T (n, m) to be parallel time for the algorithm under considera- tion with m processors on input of size n Let S(n) be the time complexity of the best known sequential time algorithm for a particular problem. Then the efficiency of a parallel algorithm is defined by

E(n, m) = S(n) mT (n, m)

Efficiencies can range from 0 to 1. The best possible efficiency you can have is 1. Generally we prefer algorithm whose efficiency is Ω(1). The folding principle states that you can always use fewer processors and get the same efficiency.

3 OR Problem

INPUT: bits b 1 ,... bn OUTPUT: The logical or of the bits One can obtain an EREW Algorithm with T (n, n) = log n using a divide and conquer algorithm that is perhaps best understood as a binary tree. The leaves of the binary tree are the bits. Each internal node is a processor that OR’s the output of its children. The efficiency of the EREW Algorithm is

n n log n

log n

One can obtain an EREW algorithm for the OR problem with E(n, m = n/ log n) = Θ(1), by partitioning the input into (^) logn n partitions of log n bits each, and precomputing the OR of the bits in these partitions. One can also obtain a CRCW Algorithm with T (n, n) = Θ(1). In this algorithm each processor Pi sets a variable answer to 0, then if bi = 1, Pi sets answer to 1. The efficiency of this algorithm is E(n, n) = Θ(1).

4 MIN Problem

See section 10.2.1, and section 10.2.2. INPUT: Integers x 1 ,... , xn OUTPUT: The smallest xi The results are essentially the same as for the OR problem. There is an EREW divide and conquer algorithm with E(n, n/ log n) = 1. Note that this technique works for any associative operator (both OR and MIN are associative). There is an CRCW algorithm with T (n, m = n^2 ) = 1 and E(n, m = n^2 ) = 1 /n. Here’s code for processor Pi,j , 1 ≤ i, j ≤ j for the CRCW algorithm to compute the location of the minimum number:

For i = 1 to n do For j=1 to n do D[i,j]=weight of edge (i, j)

Repeat log n times For i = 1 to n do For j=1 to n do For m=1 to n do D[i,j]=min{D[i,j], D[i,m]+D[m,j]}

The correctness of this procedure can be seen using the following loop invariant: After t times through the repeat loop, for all i and j, if the length of the shortest path between vertices i and j that has 2t^ or less edges is equal to D[i, j]. Note that the outer loop is definitely not parallizable. Lets try to parallize the innter 3 loops to see what happens.

Repeat log n times For all i,j,m in parallel D[i,j]=min{D[i,j], D[i,m]+D[m,j]}

But note that this would require a concurrent write machine that always writes the smallest value. So we try:

Repeat log n times For all i,j,m in parallel T[i,m, j]=min{D[i,j], D[i,m]+D[m,j]} D[i,j]=min{T[1,1,j] ... T[1,n,j]}

This runs in time T (n, n^3 ) = log^2 n on an CREW. This gives an efficiency something like E(n, m = n^3 ) = n^3 /(n^3 log^2 n) = 1/ log^2 n.

7 Odd-Even Merging

See section 10.2.1.

INPUT: Sorted lists x 1 ,... , xn, and y 1 ,... , yn. OUTPUT: The two lists merged into one sorted list z 1 ,... , z 2 n We give the following divide and conquer algorithm

Merge(x_1, ..., x_n, y_1 ... y_n)

Merge(x_1, x_3, x_5, y_2, y_4, y_6 ...) to get a_1 ... a_n Merge(x_2, x_4, x_6, y_1, y_3, y_5 ...) to get b_1 ... b_n for i=1 to n do z_2i-1=min(a_i, b_i) z_2i =max(a_i, b_i)

This can be implemented on an EREW PRAM to run in time T (n, n) = log n thus giving efficiency Θ(1/ log n). The following argument establishes the correctness of the algorithm. Each ai is greater than or equal a 1 ,... , ai. Each ai, i > 1 is larger than bi− 1. Hence, ai ≥ z 2 i− 1. Each bi is greater than or equal a 1 ,... , bi. Each bi, i > 1, is larger than ai− 1. Hence, bi ≥ z 2 i− 1. This same argument shows ai+1 ≥ z 2 i+1 and bi+1 ≥ z 2 i+1. So z 2 i− 1 and z 2 i must be ai and bi.

8 Odd-Even Merge Sorting

See section 10.2.1. We give the following divide and conquer algorithm

Sort(x_1, ... x_n,)

Merge(Sort(x_1, ... x_n/2), Sort(x_n/2, ... x_n))

This can be implemented on an EREW PRAM to run in time T (n, n) = log^2 n thus giving efficiency (^) nn loglog 2 n (^) n = Θ(1/ log n).

depth of each node in the tree. We show by example how to reduce this to pointer doubling. From the following tree

A

/ \

B C

/ \

D E

We create the list (the first line)

1, 1, -1, 1, -1, -1, 1, - A B D B E B A C A

and call pointer doubling with d[i] initialized to either 1 or -1 appropriately. The depth of a node is then computed by looking at the sum up to the point shown in the second line.

11 Expression Evaluation

The input is an algebraic expression in the form of a binary tree, with the leaves being the elements, and the internal nodes being the algebraic oper- ations. The goal is to compute the value of the expression. Some obvious approaches won’t work are: 1) Evaluate nodes when both values of children are known, and 2) parallel prefix. The first approach won’t give you a speed up if the tree is unbalanced. The second approach won’t work if the operators are not be associative. First assume that the only operation is subtraction. We label edges by functions. We now define the cut operation. If we have a subtree that looks like

| h(x)

/ \

f(x)/ \ g(x)

  • constant c /
    A B

and cut on the root of this subtree we get

| h(f(x) - g(c))

/
/
A B

If we have a subtree that looks like

| h(x)

/
f(x)/ \ g(x) constant c - /
A B

and cut on the root of this subtree we get

| h(f(c) - g(x))

/
/
A B

Thus we are left with finding a class of functions, with the base elements being constants, that are closed under composition, subtraction of constants, and subtraction from constants. This class is the functions of the form ax + b, for real/rational a and b.