Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Parallel Prefix Algorithms: Coarse-Grained vs. Fine-Grained Approaches for Prefix Sums, Study notes of Computer Science

University of Notre Dame Computer Science

Parallel prefix algorithms, focusing on coarse-grained and fine-grained approaches for solving the prefix sum problem. Coarse-grained algorithms are commonly used in parallel systems due to their efficiency, while fine-grained algorithms offer better efficiency when p = n. The document also covers the recursive and non-recursive algorithms for parallel prefix sums, their analysis, and advantages.

Typology: Study notes

Pre 2010

Uploaded on 02/24/2010

koofers-user-y31 🇺🇸

10 documents

1 / 15

This page cannot be seen from the preview

Don't miss anything!

Parallel Prefix

9/10/09

Coarse vs. fine grained

• Unlike the problem on the first

homework, most parallel algorithms are

coarse-grained.

– i.e., n/p >> 1

• This is required to achieve adequate

speedups on most parallel systems.

– computation > communication

Discover Study notes of Computer Science University of Notre Dame

Partial preview of the text

Download Parallel Prefix Algorithms: Coarse-Grained vs. Fine-Grained Approaches for Prefix Sums and more Study notes Computer Science in PDF only on Docsity!

Parallel Prefix

Coarse vs. fine grained

Unlike the problem on the first

homework, most parallel algorithms are

coarse-grained.

i.e., n/p >> 1
This is required to achieve adequate

speedups on most parallel systems.

computation > communication

Algorithm design

Many parallel algorithms have sequential and parallel modules.
The best case approach is to design a fine-grained algorithm where p = n. - This would guarantee efficiency for all p < n , which is a more typical situation because efficiency always scales down (within a constant).
We will discuss a specfic example of this today.

Prefix sums

We are given n elements x 0 , x 1 , …, xn-

and a binary associative operator ⊗

Computing the partial sums s 0 , s 1 , …,

sn-1 , where si = x 0 ⊗ x 1 ⊗ … ⊗ xi , is

called the prefix sum problem.

Serial algorithm is Ω( n ).

Parallel Prefix

We would like to develop a parallel algorithm to compute prefix sums.
In addition, it turns out that many different problems can be solved efficiently using this parallel algorithm. - e.g., max or min operation
Often called a scan or sweep operation.

Example

Parallel prefix ([1,2,3,4,5,6,7,8], sum)
- Returns [1, 3, 6, 10, 15, 21, 28, 36]
Although we will focus on sums today,

many other operators can be used to

solve an assortment of problems.

Examples will be in the next homework problem set handed out Tues.

Return to parallel sums

In class, we discussed an algorithm for sums where one processor received the answer.
For your homework, you were asked to develop an alternative were all processors got the result. - This can be achieved by exchanging values instead of unidirectional communication.

One answer

For i = 0 to d - 1 do
- Send sum to the processor obtained by inverting the i th^ bit
- Receive sum from the processor obtained by inverting the i th^ bit
- Add received sum to local sum

Illustration

Input vector Prefix sums of 1st half Prefix sums of 2nd half (Requires log p time) Analysis from Aluru Chapter 1

Improving this further

Suppose that we calculate both the

prefix sums and the total sums on each

of the two partitions.

This will add slightly more memory but

result in a substantial improvement as

we will see.

Illustration

Input vector Prefix sums of 1st half Prefix sums of 2nd half (Requires one hypercubic permutation)

Non-recursive algorithm

Set prefix sum to be element on this processor.
Set total sum to be prefix sum.
For i = 0 to d - 1
- Send total sum to processor obtained by inverting the i th^ bit of self id and receive back
- Add received sum to total sum
- If exchange occurs between processor with a smaller id, add received sum to prefix sum

Efficiency

This algorithm is as efficient as adding n numbers. - Only a constant amount of additional work is required during the parallel algorithm.
Further, we can compute all prefix sums in the time it takes to compute the last prefix sum Sn-1 , which is also the total sum.

Details

In the non-recursive algorithm, we maintain two variables: - Prefix sum - Total sum
The algorithm contains log p phases, each of which requiring O(1) computation and one communication. - Worse case is two adds when it communication occurs with a processor with a lower rank.

Analysis of algorithm

Where p = n
- Computation time = O (log p )
- Communication time = O((τ + μ) log p )
Where p > n
- Computation time = O( n/p + log p )
- Communication time = O((τ + μ) log p )

Example

Suppose we want to calculate the rank of a processor given “marked” and “unmarked” processors.
Rank = # of preceeding processors with mark if and only if this processor is marked.
How would you solve this problem?

Parallel Prefix Algorithms: Coarse-Grained vs. Fine-Grained Approaches for Prefix Sums, Study notes of Computer Science

Related documents

Partial preview of the text

Download Parallel Prefix Algorithms: Coarse-Grained vs. Fine-Grained Approaches for Prefix Sums and more Study notes Computer Science in PDF only on Docsity!

Parallel Prefix

Coarse vs. fine grained

homework, most parallel algorithms are

coarse-grained.

speedups on most parallel systems.

Algorithm design

Prefix sums

and a binary associative operator ⊗

sn-1 , where si = x 0 ⊗ x 1 ⊗ … ⊗ xi , is

called the prefix sum problem.

Parallel Prefix

Example

many other operators can be used to

solve an assortment of problems.

Return to parallel sums

One answer

Illustration

Improving this further

prefix sums and the total sums on each

of the two partitions.

result in a substantial improvement as

we will see.

Illustration

Non-recursive algorithm

Efficiency

Details

Analysis of algorithm

Example

Butterfly networks

hypercube of p processors using p (log

p +1).

columns of p processors each.

Details

are called row links; others are

appropriately called hypercube links.

networks achieve similar results with

fewer processors.

Dynamic butterfly networks

crossbar, crossbars to crossbars.

stages.