Parallel Algorithm Design: Foster's Methodology and Task/Channel Model, Slides of Parallel Computing and Programming

An in-depth exploration of foster's methodology and the task/channel model in parallel algorithm design. Learn about partitioning, communication, agglomeration, and mapping, as well as real-life case studies and communication patterns between primitive tasks. Understand the importance of maximizing processor utilization, minimizing inter-processor communication, and scaling up in parallel computation.

Typology: Slides

2012/2013

Uploaded on 04/24/2013

banamala
banamala 🇮🇳

4.4

(19)

114 documents

1 / 54

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 3
Parallel Algorithm Design
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36

Partial preview of the text

Download Parallel Algorithm Design: Foster's Methodology and Task/Channel Model and more Slides Parallel Computing and Programming in PDF only on Docsity!

Chapter 3

Parallel Algorithm Design

Outline

  • Task/channel model
  • Algorithm design methodology
  • Case studies

Task/Channel Model

Task Channel

Foster’s Design Methodology

  • Partitioning
    • Dividing the Problem into Tasks
  • Communication
    • Determine what needs to be communicated between the Tasks over Channels
  • Agglomeration
    • Group or Consolidate Tasks to improve efficiency or simplify the programming solution
  • Mapping
    • Assign tasks to the Computer Processors

Step 1: Partitioning

Divide Computation & Data into Pieces

  • Domain Decomposition – Data Centric Approach
    • Divide up most frequently used data
    • Associate the computations with the divided data
  • Functional Decomposition – Computation Centric Approach - Divide up the computation - Associate the data with the divided computations
  • Primitive Tasks: Resulting Pieces from either Decomposition - The goal is to have as many of these as possible

Example Domain Decompositions

Partitioning Checklist

  • Lots of Tasks
    • e.g, at least 10x more primitive tasks than processors in target computer
  • Minimize redundant computations and data
  • Load Balancing
    • Primitive tasks roughly the same size
  • Scalable
    • Number of tasks an increasing function of problem size

Step 2: Communication

Determine Communication Patterns between Primitive Tasks

  • Local Communication
    • When Tasks need data from a small number of other Tasks
    • Channel from Producing Task to Consuming Task Created
  • Global Communication
    • When Task need data from many or all other Tasks
    • Channels for this type of communication are not created during this step Docsity.com

Step 3: Agglomeration

Group Tasks to Improve Efficiency or Simplify Programming

  • Increase Locality
    • remove communication by agglomerating Tasks that Communicate with one another
    • Combine groups of sending & receiving task
      • Send fewer, larger messages rather than more short messages which incur more message latency.
  • Maintain Scalability of the Parallel Design
    • Be careful not to agglomerate Tasks so much that moving to a machine with more processors will not be possible
  • Reduce Software Engineering costs
    • Leveraging existing sequential code can reduce theDocsity.com

Agglomeration Can Improve

Performance

  • Eliminate communication between primitive tasks agglomerated into consolidated task
  • Combine groups of sending and receiving tasks

Step 4: Mapping

Assigning Tasks to Processors

  • Maximize Processor Utilization - Ensure computation is evenly balanced across all processors

Optimal Mapping

  • Finding optimal mapping is NP-hard
  • Must rely on heuristics

Mapping Goals

  • Mapping based on one task per processor and multiple tasks per processor have been considered
  • Both static and dynamic allocation of tasks to processors have been evaluated
  • If a dynamic allocation of tasks to processors is chosen, the Task allocator is not a bottleneck

CASE STUDIES

Boundary value problem Finding the maximum The n-body problem Adding data input