Design Patterns for Parallel Programming, Lecture Slides - Assembly Programming 2, Slides of Assembly Language Programming

Design Patterns for Parallel Programming,Amhahl's Law, Orchestration and Mapping Task Decomposition, Reengineering for Parallelism, Sequential Molecular Dynamics, Simulator Finding Concurrency, Design Space

Typology: Slides

2010/2011

Uploaded on 10/11/2011

lovefool
lovefool 🇬🇧

4.5

(21)

292 documents

1 / 37

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Dr. Rodric Rabbah, IBM. 1 6.189 IAP 2007 MIT
6.189 IAP 2007
Lecture7
Design Patterns for
Parallel Programming II
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25

Partial preview of the text

Download Design Patterns for Parallel Programming, Lecture Slides - Assembly Programming 2 and more Slides Assembly Language Programming in PDF only on Docsity!

Dr. Rodric Rabbah, IBM.^

6.189 IAP 2007Lecture7Design Patterns forParallel Programming^ II^1 6.189 IAP 2007 MIT

2 6.189 IAP 2007 MIT

Recap: Common Steps to Parallelization Dr. Rodric Rabbah, IBM.

P^0 Tasks^ ExecutionUnits

P^1 PP 2 3 Processors Partitioning doamesra cscpoihp p^ p^ p^ p^0 10 1 mgei pnsn omtg ser ina p^ p^ p^ p^ ttt^2 32 3 ii oo nn SequentialParallelcomputation Program

4 6.189 IAP 2007 MIT

Dependence Analysis ●^ Given two tasks how to determine if they can safelyrun in parallel? Dr. Rodric Rabbah, IBM.

5 6.189 IAP 2007 MIT

Bernstein’s Condition ●^ R: set of memory locations read (input) by taski Dr. Rodric Rabbah, IBM.

Ti ●^ W: set of memory locations written (output) by taskj

Tj ●^ Two tasks^ Tand^ T^1

are parallel if (^2) „ input to T is not part of output from^ T (^1 2) „ input to T is not part of output from^ T (^2 1) „ outputs from Tand T^ do not overlap 1 2

7 6.189 IAP 2007 MIT

Patterns for Parallelizing Programs^ Algorithm Expression ●^ Finding Concurrency^ „^ Expose concurrenttasks ●^ Algorithm Structure^ „^ Map tasks to units ofexecution to exploitparallel architectureDr. Rodric Rabbah, IBM.

4 Design SpacesSoftware Construction^ ●^ Supporting Structures^ „^ Code and data structuringpatterns^ ●^ Implementation Mechanisms^ „^ Low level mechanisms usedto write parallel programs^ Patterns for ParallelProgramming. Mattson,Sanders, and Massingill(2005).

8 6.189 IAP 2007 MIT

Algorithm Structure Design Space ●^ Given a collection of concurrent tasks, what’s thenext step? ●^ Map tasks to units of execution (e.g., threads) ●^ Important considerations^ „^ Magnitude of number of execution units platform willsupport^ „^ Cost of sharing information among execution units^ „^ Avoid tendency to over constrain the implementation–^ Work well on the intended platform–^ Flexible enough to easily adapt to different architecturesDr. Rodric Rabbah, IBM.

10 6.189 IAP 2007 MIT

Organize by Tasks?Recursive?^ TaskParallelism Dr. Rodric Rabbah, IBM.

yes Divide and Conquer no

11 6.189 IAP 2007 MIT

Task Parallelism ●^ Ray tracing^ „^ Computation for each ray is a separate and independent ●^ Molecular dynamics^ „^ Non-bonded force calculations, some dependencies ●^ Common factors^ „^ Tasks are associated with iterations of a loop^ „^ Tasks largely known at the start of the computation^ „^ All tasks may not need to complete to arrive at a solutionDr. Rodric Rabbah, IBM.

13 6.189 IAP 2007 MIT

Organize by Data?Recursive?^ GeometricDecompositionDr. Rodric Rabbah, IBM.

Recursive Data ●^ Operations on a central data structure^ „^ Arrays and linear data structures^ „^ Recursive data structures

yes no

14 6.189 IAP 2007 MIT

Geometric Decomposition ●^ Gravitational bodysimulator^ „^ Calculate forcebetween pairs ofobjects and updateaccelerations Dr. Rodric Rabbah, IBM.

VEC3D^ acc[NUM_BODIES]^ =^ 0; for^ (i^ =^ 0;^ i^ <^ NUM_BODIES^ -^ 1;^ i++)

for^ (j^ =^ i^ +^ 1;^ j^ <^ NUM_BODIES;^ j++)

//^ Displacement^ vectorVEC3D^ d^ =^ pos[j]^ ^ pos[i];//^ Forcet^ =^1 /^ sqr(length(d));//^ Components^ of^ force^ along^ displacementd^ =^ t^ *****^ (d^ /^ length(d));acc[i]^ +=^ d^ *****^ mass[j];acc[j]^ +=^ -d^ *****^ mass[i];} }

pos

pos vel

16 6.189 IAP 2007 MIT

Recursive Data Example: Find the Root^4321 65 7 Dr. Rodric Rabbah, IBM.

44 33 22 1 61 6 5 75 7 Step 1^ Step 2

Step 3 ●^ Given a forest of rooted directed trees, for eachnode, find the root of the tree containing the node^ „^ Parallel approach: for each node, find its successor’ssuccessor, repeat until no changes–^ O(log n) vs. O(n)

17 6.189 IAP 2007 MIT

Work vs. Concurrency Tradeoff ●^ Parallel restructuring of find the root algorithm leadsto O(n log n) work vs. O(n) with sequential approach ●^ Most strategies based on this pattern similarly tradeoff increase in total work for decrease in executiontime due to concurrencyDr. Rodric Rabbah, IBM.

19 6.189 IAP 2007 MIT

Pipeline Throughput vs. Latency ●^ Amount of concurrency in a pipeline is limited by thenumber of stages ●^ Works best if the time to fill and drain the pipeline issmall compared to overall running time ●^ Performance metric is usually the throughput^ „^ Rate at which data appear at the end of the pipeline pertime unit (e.g., frames per second) ●^ Pipeline latency is important for real-timeapplications^ „^ Time interval from data input to pipeline, to data outputDr. Rodric Rabbah, IBM.

20 6.189 IAP 2007 MIT

Event-Based Coordination ●^ In this pattern, interaction of tasks to process datacan vary over unpredictable intervals ●^ Deadlocks are likely for applications that use thispattern Dr. Rodric Rabbah, IBM.