Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Parallel Computation: PRAM Model and Efficiency Analysis of OR and MIN Problems, Study notes of Computer Science

University of Pittsburgh (Pitt) - Medical Center-Health System Computer Science

The pram model of parallel computation, focusing on the or and min problems. It covers various algorithms, including erew and crcw, and their efficiencies. The document also touches upon the parallel prefix problem and the all pairs shortest path problem.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-15a 🇺🇸

10 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

1 Parallel and Distributed Computation

Parallel computation: Tightly coupled processors that can communicate al-

most as quickly as perform a computation

Distributed computation: Loosely couple processor for which communication

is much slower than computation

2 PRAM Model

APRAM machine consists of msynchronous processors with shared memory.

This model ignores synchronization problems and communication issues, and

concentrates on the task of parallelization of the problem. One gets various

variations of this model depending on how various processors are permitted

to access the same memory location at the same time.

•ER= Exclusive Read, only one processor can read a location in any 1

step

•CR= Concurrent Read, any number of processors can read a location

in a step

•EW= Exclusive write, only one processor can write a location in any

1 step

•CW= Concurrent Write, any number of processors can write a location

in a step. What it 2 processors try to write different values?

–Common: All processors must be trying to write the same value

–Arbitrary: An arbitrary processor succeeds in the case of a write

conflict

–Priority: The lowest number processor succeeds

We define T(n, m) to be parallel time for the algorithm under considera-

tion with mprocessors on input of size nLet S(n) be the time complexity

of the best known sequential time algorithm for a particular problem. Then

the efficiency of a parallel algorithm is defined by

E(n, m) = S(n)

mT (n, m)

1

Discover Study notes of Computer Science University of Pittsburgh (Pitt) - Medical Center-Health System

Partial preview of the text

Download Parallel Computation: PRAM Model and Efficiency Analysis of OR and MIN Problems and more Study notes Computer Science in PDF only on Docsity!

1 Parallel and Distributed Computation

Parallel computation: Tightly coupled processors that can communicate al- most as quickly as perform a computation Distributed computation: Loosely couple processor for which communication is much slower than computation

2 PRAM Model

A PRAM machine consists of m synchronous processors with shared memory. This model ignores synchronization problems and communication issues, and concentrates on the task of parallelization of the problem. One gets various variations of this model depending on how various processors are permitted to access the same memory location at the same time.

ER= Exclusive Read, only one processor can read a location in any 1 step
CR= Concurrent Read, any number of processors can read a location in a step
EW= Exclusive write, only one processor can write a location in any 1 step
CW= Concurrent Write, any number of processors can write a location in a step. What it 2 processors try to write different values? - Common: All processors must be trying to write the same value - Arbitrary: An arbitrary processor succeeds in the case of a write conflict - Priority: The lowest number processor succeeds

We define T (n, m) to be parallel time for the algorithm under considera- tion with m processors on input of size n Let S(n) be the time complexity of the best known sequential time algorithm for a particular problem. Then the efficiency of a parallel algorithm is defined by

E(n, m) = S(n) mT (n, m)

Efficiencies can range from 0 to 1. The best possible efficiency you can have is 1. Generally we prefer algorithm whose efficiency is Ω(1). The folding principle states that you can always use fewer processors and get the same efficiency.

3 OR Problem

INPUT: bits b 1 ,... bn OUTPUT: The logical or of the bits One can obtain an EREW Algorithm with T (n, n) = log n using a divide and conquer algorithm that is perhaps best understood as a binary tree. The leaves of the binary tree are the bits. Each internal node is a processor that OR’s the output of its children. The efficiency of the EREW Algorithm is

n n log n

log n

One can obtain an EREW algorithm for the OR problem with E(n, m = n/ log n) = Θ(1), by partitioning the input into (^) logn n partitions of log n bits each, and precomputing the OR of the bits in these partitions. One can also obtain a CRCW Algorithm with T (n, n) = Θ(1). In this algorithm each processor Pi sets a variable answer to 0, then if bi = 1, Pi sets answer to 1. The efficiency of this algorithm is E(n, n) = Θ(1).

4 MIN Problem

See section 10.2.1, and section 10.2.2. INPUT: Integers x 1 ,... , xn OUTPUT: The smallest xi The results are essentially the same as for the OR problem. There is an EREW divide and conquer algorithm with E(n, n/ log n) = 1. Note that this technique works for any associative operator (both OR and MIN are associative). There is an CRCW algorithm with T (n, m = n^2 ) = 1 and E(n, m = n^2 ) = 1 /n. Here’s code for processor Pi,j , 1 ≤ i, j ≤ j for the CRCW algorithm to compute the location of the minimum number:

For i = 1 to n do For j=1 to n do D[i,j]=weight of edge (i, j)

Repeat log n times For i = 1 to n do For j=1 to n do For m=1 to n do D[i,j]=min{D[i,j], D[i,m]+D[m,j]}

The correctness of this procedure can be seen using the following loop invariant: After t times through the repeat loop, for all i and j, if the length of the shortest path between vertices i and j that has 2t^ or less edges is equal to D[i, j]. Note that the outer loop is definitely not parallizable. Lets try to parallize the innter 3 loops to see what happens.

Repeat log n times For all i,j,m in parallel D[i,j]=min{D[i,j], D[i,m]+D[m,j]}

But note that this would require a concurrent write machine that always writes the smallest value. So we try:

Repeat log n times For all i,j,m in parallel T[i,m, j]=min{D[i,j], D[i,m]+D[m,j]} D[i,j]=min{T[1,1,j] ... T[1,n,j]}

This runs in time T (n, n^3 ) = log^2 n on an CREW. This gives an efficiency something like E(n, m = n^3 ) = n^3 /(n^3 log^2 n) = 1/ log^2 n.

7 Odd-Even Merging

See section 10.2.1.

INPUT: Sorted lists x 1 ,... , xn, and y 1 ,... , yn. OUTPUT: The two lists merged into one sorted list z 1 ,... , z 2 n We give the following divide and conquer algorithm

Merge(x_1, ..., x_n, y_1 ... y_n)

Merge(x_1, x_3, x_5, y_2, y_4, y_6 ...) to get a_1 ... a_n Merge(x_2, x_4, x_6, y_1, y_3, y_5 ...) to get b_1 ... b_n for i=1 to n do z_2i-1=min(a_i, b_i) z_2i =max(a_i, b_i)

This can be implemented on an EREW PRAM to run in time T (n, n) = log n thus giving efficiency Θ(1/ log n). The following argument establishes the correctness of the algorithm. Each ai is greater than or equal a 1 ,... , ai. Each ai, i > 1 is larger than bi− 1. Hence, ai ≥ z 2 i− 1. Each bi is greater than or equal a 1 ,... , bi. Each bi, i > 1, is larger than ai− 1. Hence, bi ≥ z 2 i− 1. This same argument shows ai+1 ≥ z 2 i+1 and bi+1 ≥ z 2 i+1. So z 2 i− 1 and z 2 i must be ai and bi.

8 Odd-Even Merge Sorting

See section 10.2.1. We give the following divide and conquer algorithm

Sort(x_1, ... x_n,)

Merge(Sort(x_1, ... x_n/2), Sort(x_n/2, ... x_n))

This can be implemented on an EREW PRAM to run in time T (n, n) = log^2 n thus giving efficiency (^) nn loglog 2 n (^) n = Θ(1/ log n).

depth of each node in the tree. We show by example how to reduce this to pointer doubling. From the following tree

A

/ \

B C

/ \

D E

We create the list (the first line)

1, 1, -1, 1, -1, -1, 1, - A B D B E B A C A

and call pointer doubling with d[i] initialized to either 1 or -1 appropriately. The depth of a node is then computed by looking at the sum up to the point shown in the second line.

11 Expression Evaluation

The input is an algebraic expression in the form of a binary tree, with the leaves being the elements, and the internal nodes being the algebraic oper- ations. The goal is to compute the value of the expression. Some obvious approaches won’t work are: 1) Evaluate nodes when both values of children are known, and 2) parallel prefix. The first approach won’t give you a speed up if the tree is unbalanced. The second approach won’t work if the operators are not be associative. First assume that the only operation is subtraction. We label edges by functions. We now define the cut operation. If we have a subtree that looks like

| h(x)

/ \

f(x)/ \ g(x)

constant c /
A B

and cut on the root of this subtree we get

| h(f(x) - g(c))

/
/
A B

If we have a subtree that looks like

| h(x)

/
f(x)/ \ g(x) constant c - /
A B

and cut on the root of this subtree we get

| h(f(c) - g(x))

/
/
A B

Thus we are left with finding a class of functions, with the base elements being constants, that are closed under composition, subtraction of constants, and subtraction from constants. This class is the functions of the form ax + b, for real/rational a and b.

Parallel Computation: PRAM Model and Efficiency Analysis of OR and MIN Problems, Study notes of Computer Science

Related documents

Partial preview of the text

Download Parallel Computation: PRAM Model and Efficiency Analysis of OR and MIN Problems and more Study notes Computer Science in PDF only on Docsity!

1 Parallel and Distributed Computation

2 PRAM Model

3 OR Problem

4 MIN Problem

7 Odd-Even Merging

8 Odd-Even Merge Sorting

A

/ \

B C

/ \

D E

11 Expression Evaluation