




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A chapter from the book 'parallel program design' by sanjay rajopadhye, focusing on the task channel model for parallel computing and efficient broadcast algorithms. The chapter covers topics such as task partitioning, communication, and analysis of broadcast algorithms in a parallel computing context. It also includes case studies on adding numbers on a grid, n-body problem, and matrix multiplication.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Computation/Accumulation
Number of steps: 2 + 2* Total execution time: 197
Tradeoff
Local data = 4 Communication distance = 45 Total time = 139
10x20 grid: 95 10x10 grid: 70 5x10 grid: 65 5x5 grid: 70
Tradeoff (analytical) Assumptions: P processors, arranged in a square grid (to minimize perimeter) N numbers to add, no overlap of communication & computation Total execution time = N/P + 3 √P One term increases with P (as a square-root, polynomial of degree 0.5) and other decreases (hyperbolically). Note we are not interested in asymptotic behavior here, but rather with a tradeoff. What value of P minimizes the time. Set derivative to zero and solve ! P ^ = 4 N^ 2 3 3 Task-Channel Model Task: program with local memory plus “I/O ports” d ef pa rse(bod y) if /Error/ =~ bod y then m = /(^ #^ /r/P a rserFa ilure.m a tch()[i] g enera tes a wa rning.+)/.m a tch(bod y) end^ r a ise m [0] h = /a c = /a href=" href="(..h)"/.m a tch(bod y)(.*.c)"/.m a tch(bod y) end^ r eturn h[1],c[1]
Communication
Clutters the graph (avoid, but keep track) Drawback of the methodology: “collective communications” are a powerful design paradigm (e.g., matrix multiplication later). Communication issues
tasks should perform communications independently
tasks should be able to perform computations concurrently Agglomeration
Agglomeration issues
take less time than avoided communication allow scaling Clusters have similar size (computation + communication)
increasing function of problem size as small as possible (but no less than number of processors)
Mapping
Static Structured communication constant computation per task minimize communication, one task per processor variable computation per task cyclic mapping for load balance Unstructured static load balancing algorithm Dynamic Frequent communication dynamic load balancing algorithm Many short lived tasks (no inter-task communications) run-time schedule Case studies: broadcast
“Octopus accountant” grid Ideal (fully connected) machine where any pair of processors can communicate (but a processor cannot simultaneously communicate with multiple other processors) Broadcast (grid lower bound)
Broadcast (ideal lower bound)
n- Body Problem Partitioning: one task per body Communication: complete graph main “collective communication event” is everyone must send all its single, unique data to everyone else, i.e., a simultaneous broadcast also called “all-gather” Brute force: n steps, each processor receives data from every other coordination is a detail that needs to be resolved Better way: divide and conquer If half the nodes have already achieved an all gather amongst themselves One more step to complete all-gather for all n nodes. All-gather Analysis Naïve strategy: n λ Divide-and-conquer: lg n steps T = λ lg n Message size grows at each step Realistic communication model (affine cost function) Time to transmit a message of volume v All-gather under affine communication model (geometric series) ! t = " + (^) # v ! T gather = "lg n + (^) # n n -Body (concl.)
One task per processor and agglomerate n/p particles into each cluster. Modify the all-gather
Tasks for each step Agglomeration first projects along the time iteration
T iter = "lg p + n (^ p $^ p #^1 )+ % n p Input/Output
Adding accountants example Divide & conquer scatter (in affine model) ! T iter = "lg p + n (^ p $^ p #^1 )
PRAM Model
PRAM Model (variants) Conflict resolution (for both reads and writes): Disallow (weaker machine) but probably more realistic Allow but provide a resolution mechanism EREW: Exclusive Read, Exclusive Write (no conflicts allowed) CREW ERCW (rarely considered) CRCW: Write conflicts resolved by arbitrary: any of the writers succeeds (non-deterministically) priority: the writer with the highest priority succeeds common: the algorithm must ensure that all writers write the same value combining (most powerful): the values are combined using an associative operator (e.g., sum, max, etc.) PRAM Matrix Multiplication
Why PRAM
Evolution of the matrix multiplication example