Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Parallel Reduction Algorithms - Parallel Processing - Lecture Slides, Slides of Parallel Computing and Programming

Aliah University Parallel Computing and Programming

Some concept of Parallel Processing are Anatomy, Cache Access Time, Instruction Formats, Instruction Formats, Instruction Formats, Multidimensional Meshes, Network Processors, Snooping Protocol. Main points of this lecture are: Parallel Reduction Algorithms, Different Interconnection Topologies, Example, Message-Passing, Parallel Program, Message-Passing Parallel Program, Reduction Computations, Their Parallelization, Reduction Computation, Recursive Reduction Approach

Typology: Slides

2012/2013

Uploaded on 04/30/2013

devank 🇮🇳

4.3

(12)

152 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Lecture 2: Parallel Reduction Algorithms & Their

Analysis on Different Interconnection Topologies

Docsity.com

Discover Slides of Parallel Computing and Programming Aliah University

Partial preview of the text

Download Parallel Reduction Algorithms - Parallel Processing - Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Lecture 2: Parallel Reduction Algorithms & Their

Analysis on Different Interconnection Topologies

An example of an SPMD message-passing parallel program

Reduction Computations & Their Parallelization

The prior max computation is a reduction computation , defined as x = f(D),

where D is a data set (e.g., a vector), and x is a scalar quantity. In the max

computation, f is the max function, and D the set/vector of numbers for which

the max is to be computed.

Reduction computations that are associative [defined as f(a,b,c) = f(f(a,b), c) =

f(a, f(b,c))], can be easily parallelized using a recursive reduction approach (the

final value of f(D) needs to be at some processor at the end of the parallel

computation):

The data set D is evenly distributed among P processors—each processor Pi has a disjoint subset Di of D/P data elements.
Each processor performs the computation f on its data set
Each processor then engages in (log P) rounds of message passing with some other processors. In the k’th round Pi communicates with a unique partner processor Pj = partner(Pi, k) in which it sends or receives (depending, say, on whether its id is is > or < than Pj, resp.) the current f computation result it or Pj contains, resp.
If Pi receives a computation result b from Pj in the k’th round, it computes a = f(a,b), where a is its current result, and participates in the (k+1)’th round of commun. If Pi has sent its data to Pj, then it does not participate in any further rounds of communication and computation; it is done with its task.
At the end of the (log P) rounds of communication, the processor with the least ID (= 0) will hold f(D).

Reduction Computations & Their Parallelization (contd.)

Assuming (Pi, Pj), where Pj = partner(Pi, k), is a unique send-recv pair in round k, the # of processors holding the required partial computation results halve after each round, and hence the term recursive halving for such parallel computations.
In general, there are other variations of parallel reduction computations (generally dictated by the interconnection topology) in which the # of processors will reduce by some factor other than 2 in each round of communication. The general term for such parallel computations is recursive reduction.
A topology independent recursive-halving communication pattern is shown below. Note also that as the # of processors involved halve, the # of initial data sets that each “active” partial result represents/covers double (a recursive doubling of coverage of data sets by each active partial result). Total # of msgs sent is P-1.

Time step 1 (^) Time step 1 Time step 1 Time step 1

Time step 2 Time step 2

Time step 3

Analysis of Parallel Reduction on Different Topologies

Recursive halving based reduction on a hypercube:
- Initial computation time = Theta(N/P); N= # of data items, P = # processors.
- Communication time = Theta(log P), as there are (log P) msg passing rounds, in each round all msgs are sent in parallel, each msg is a 1-hop msg., and there is no conflict among msgs
- Computation time during commun. rounds = Theta(log P) [1 red. oper. in eachprocessor in each round).
Same comput. and commun. time for exchange commun. on a hypercube
Speedup = S(P) = Seq._time/Parallel_time(P) = Theta(N)/[Theta((N/P) + Theta(2*logP))] ~ Theta(P) if N >> P

1 1

(^3 )

(a) Hypercubes of dimensions 0 to 4

(b) Msg pattern for a reduction comput. using recursive halving; processor 000 will hold the final result

Time steps

Analysis of Parallel Reduction on Different Topologies (contd).

Recursive reduction on a direct tree:
- Initial Computation time = Theta(N/P); N= # of data items, P = # processors.
- Communication time = Theta((log (P/2)), as there are (log ((P+1)/2)) msg passing rounds, in each round all msgs are sent in parallel, each msg is a 1-hop msg., and there is no conflict among msgs;
- Computation time during commun. rounds = Theta(2(log (P/2)) [2 red. opers. in the “parent” processor in each round) = Theta(2(log P))
Speedup = S(P) = Seq_time/Parallel_time(P) = Theta(N)/[Theta((N/P) + Theta(3*logP)

)]~ Theta(P) if N >> P

Recursive reduction in (a) a direct tree network; and (b) an indirect tree network.

1 1

(^2 )

1 1 1, 2^ 1, 2

2, 4

Time steps

Round #, Hops

Parallel Reduction Algorithms - Parallel Processing - Lecture Slides, Slides of Parallel Computing and Programming

Related documents

Partial preview of the text

Download Parallel Reduction Algorithms - Parallel Processing - Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Lecture 2: Parallel Reduction Algorithms & Their

Analysis on Different Interconnection Topologies

An example of an SPMD message-passing parallel program

Reduction Computations & Their Parallelization

where D is a data set (e.g., a vector), and x is a scalar quantity. In the max

computation, f is the max function, and D the set/vector of numbers for which

the max is to be computed.

f(a, f(b,c))], can be easily parallelized using a recursive reduction approach (the

final value of f(D) needs to be at some processor at the end of the parallel

computation):

Reduction Computations & Their Parallelization (contd.)

)]~ Theta(P) if N >> P