Download Parallel Algorithm Design: Foster's Methodology and Task/Channel Model and more Slides Parallel Computing and Programming in PDF only on Docsity!
Chapter 3
Parallel Algorithm Design
Outline
- Task/channel model
- Algorithm design methodology
- Case studies
Task/Channel Model
Task Channel
Foster’s Design Methodology
- Partitioning
- Dividing the Problem into Tasks
- Communication
- Determine what needs to be communicated between the Tasks over Channels
- Agglomeration
- Group or Consolidate Tasks to improve efficiency or simplify the programming solution
- Mapping
- Assign tasks to the Computer Processors
Step 1: Partitioning
Divide Computation & Data into Pieces
- Domain Decomposition – Data Centric Approach
- Divide up most frequently used data
- Associate the computations with the divided data
- Functional Decomposition – Computation Centric Approach - Divide up the computation - Associate the data with the divided computations
- Primitive Tasks: Resulting Pieces from either Decomposition - The goal is to have as many of these as possible
Example Domain Decompositions
Partitioning Checklist
- Lots of Tasks
- e.g, at least 10x more primitive tasks than processors in target computer
- Minimize redundant computations and data
- Load Balancing
- Primitive tasks roughly the same size
- Scalable
- Number of tasks an increasing function of problem size
Step 2: Communication
Determine Communication Patterns between Primitive Tasks
- Local Communication
- When Tasks need data from a small number of other Tasks
- Channel from Producing Task to Consuming Task Created
- Global Communication
- When Task need data from many or all other Tasks
- Channels for this type of communication are not created during this step Docsity.com
Step 3: Agglomeration
Group Tasks to Improve Efficiency or Simplify Programming
- Increase Locality
- remove communication by agglomerating Tasks that Communicate with one another
- Combine groups of sending & receiving task
- Send fewer, larger messages rather than more short messages which incur more message latency.
- Maintain Scalability of the Parallel Design
- Be careful not to agglomerate Tasks so much that moving to a machine with more processors will not be possible
- Reduce Software Engineering costs
- Leveraging existing sequential code can reduce theDocsity.com
Agglomeration Can Improve
Performance
- Eliminate communication between primitive tasks agglomerated into consolidated task
- Combine groups of sending and receiving tasks
Step 4: Mapping
Assigning Tasks to Processors
- Maximize Processor Utilization - Ensure computation is evenly balanced across all processors
Optimal Mapping
- Finding optimal mapping is NP-hard
- Must rely on heuristics
Mapping Goals
- Mapping based on one task per processor and multiple tasks per processor have been considered
- Both static and dynamic allocation of tasks to processors have been evaluated
- If a dynamic allocation of tasks to processors is chosen, the Task allocator is not a bottleneck
CASE STUDIES
Boundary value problem Finding the maximum The n-body problem Adding data input