Download Design Patterns for Parallel Programming, Lecture Slides - Assembly Programming 2 and more Slides Assembly Language Programming in PDF only on Docsity!
Dr. Rodric Rabbah, IBM.^
6.189 IAP 2007Lecture7Design Patterns forParallel Programming^ II^1 6.189 IAP 2007 MIT
2 6.189 IAP 2007 MIT
Recap: Common Steps to Parallelization Dr. Rodric Rabbah, IBM.
P^0 Tasks^ ExecutionUnits
P^1 PP 2 3 Processors Partitioning doamesra cscpoihp p^ p^ p^ p^0 10 1 mgei pnsn omtg ser ina p^ p^ p^ p^ ttt^2 32 3 ii oo nn SequentialParallelcomputation Program
4 6.189 IAP 2007 MIT
Dependence Analysis ●^ Given two tasks how to determine if they can safelyrun in parallel? Dr. Rodric Rabbah, IBM.
5 6.189 IAP 2007 MIT
Bernstein’s Condition ●^ R: set of memory locations read (input) by taski Dr. Rodric Rabbah, IBM.
Ti ●^ W: set of memory locations written (output) by taskj
Tj ●^ Two tasks^ Tand^ T^1
are parallel if (^2) input to T is not part of output from^ T (^1 2) input to T is not part of output from^ T (^2 1) outputs from Tand T^ do not overlap 1 2
7 6.189 IAP 2007 MIT
Patterns for Parallelizing Programs^ Algorithm Expression ●^ Finding Concurrency^ ^ Expose concurrenttasks ●^ Algorithm Structure^ ^ Map tasks to units ofexecution to exploitparallel architectureDr. Rodric Rabbah, IBM.
4 Design SpacesSoftware Construction^ ●^ Supporting Structures^ ^ Code and data structuringpatterns^ ●^ Implementation Mechanisms^ ^ Low level mechanisms usedto write parallel programs^ Patterns for ParallelProgramming. Mattson,Sanders, and Massingill(2005).
8 6.189 IAP 2007 MIT
Algorithm Structure Design Space ●^ Given a collection of concurrent tasks, what’s thenext step? ●^ Map tasks to units of execution (e.g., threads) ●^ Important considerations^ ^ Magnitude of number of execution units platform willsupport^ ^ Cost of sharing information among execution units^ ^ Avoid tendency to over constrain the implementation–^ Work well on the intended platform–^ Flexible enough to easily adapt to different architecturesDr. Rodric Rabbah, IBM.
10 6.189 IAP 2007 MIT
Organize by Tasks?Recursive?^ TaskParallelism Dr. Rodric Rabbah, IBM.
yes Divide and Conquer no
11 6.189 IAP 2007 MIT
Task Parallelism ●^ Ray tracing^ ^ Computation for each ray is a separate and independent ●^ Molecular dynamics^ ^ Non-bonded force calculations, some dependencies ●^ Common factors^ ^ Tasks are associated with iterations of a loop^ ^ Tasks largely known at the start of the computation^ ^ All tasks may not need to complete to arrive at a solutionDr. Rodric Rabbah, IBM.
13 6.189 IAP 2007 MIT
Organize by Data?Recursive?^ GeometricDecompositionDr. Rodric Rabbah, IBM.
Recursive Data ●^ Operations on a central data structure^ ^ Arrays and linear data structures^ ^ Recursive data structures
yes no
14 6.189 IAP 2007 MIT
Geometric Decomposition ●^ Gravitational bodysimulator^ ^ Calculate forcebetween pairs ofobjects and updateaccelerations Dr. Rodric Rabbah, IBM.
VEC3D^ acc[NUM_BODIES]^ =^ 0; for^ (i^ =^ 0;^ i^ <^ NUM_BODIES^ -^ 1;^ i++)
for^ (j^ =^ i^ +^ 1;^ j^ <^ NUM_BODIES;^ j++)
//^ Displacement^ vectorVEC3D^ d^ =^ pos[j]^ –^ pos[i];//^ Forcet^ =^1 /^ sqr(length(d));//^ Components^ of^ force^ along^ displacementd^ =^ t^ *****^ (d^ /^ length(d));acc[i]^ +=^ d^ *****^ mass[j];acc[j]^ +=^ -d^ *****^ mass[i];} }
pos
pos vel
16 6.189 IAP 2007 MIT
Recursive Data Example: Find the Root^4321 65 7 Dr. Rodric Rabbah, IBM.
44 33 22 1 61 6 5 75 7 Step 1^ Step 2
Step 3 ●^ Given a forest of rooted directed trees, for eachnode, find the root of the tree containing the node^ ^ Parallel approach: for each node, find its successor’ssuccessor, repeat until no changes–^ O(log n) vs. O(n)
17 6.189 IAP 2007 MIT
Work vs. Concurrency Tradeoff ●^ Parallel restructuring of find the root algorithm leadsto O(n log n) work vs. O(n) with sequential approach ●^ Most strategies based on this pattern similarly tradeoff increase in total work for decrease in executiontime due to concurrencyDr. Rodric Rabbah, IBM.
19 6.189 IAP 2007 MIT
Pipeline Throughput vs. Latency ●^ Amount of concurrency in a pipeline is limited by thenumber of stages ●^ Works best if the time to fill and drain the pipeline issmall compared to overall running time ●^ Performance metric is usually the throughput^ ^ Rate at which data appear at the end of the pipeline pertime unit (e.g., frames per second) ●^ Pipeline latency is important for real-timeapplications^ ^ Time interval from data input to pipeline, to data outputDr. Rodric Rabbah, IBM.
20 6.189 IAP 2007 MIT
Event-Based Coordination ●^ In this pattern, interaction of tasks to process datacan vary over unpredictable intervals ●^ Deadlocks are likely for applications that use thispattern Dr. Rodric Rabbah, IBM.