Download Task Partitioning - Parallel Processing - Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!
Lecture 7: Task Partitioning and
Mapping to Processes
Overview: Algorithms and
Concurrency
- Introduction to Parallel Algorithms
- Tasks and Decomposition
- Processes and Mapping
- Processes Versus Processors
- Decomposition Techniques
- Recursive Decomposition
- Recursive Decomposition
- Exploratory Decomposition
- Hybrid Decomposition
- Characteristics of Tasks and Interactions
- Task Generation, Granularity, and Context
- Characteristics of Task Interactions.
Preliminaries: Decomposition, Tasks,
- The first step in developing a parallel algorithm is toand Dependency Graphs
decompose the problem into tasks that can be executed concurrently
- A given problem may be decomposed into tasks in
many different ways.
- Tasks may be of same, different, or even indeterminate
sizes.
- A decomposition can be illustrated in the form of a
directed graph with nodes corresponding to tasks and edges indicating that the result of one task is required for processing the next. Such a graph is called a task dependency graph.
Example: Multiplying a Dense Matrix with a Vector
Computation of each element of output vector y is independent of other elements. Based on this, a dense matrix-vector product can be decomposed into n tasks. The figure highlights the portion of the matrix and vector accessed by Task 1.
Observations: While tasks share data (namely, the vector b ), they do not have any control dependencies - i.e., no task needs to wait for the (partial) completion of any other. All tasks are of the same size in terms of number of operations. Is this the maximum number of tasks we could decompose this problem into?
Example: Database Query Processing
The execution of the query can be divided into subtasks in various
ways. Each task can be thought of as generating an intermediate
table of entries that satisfy a particular clause.
Decomposing the given query into a number of tasks.
Edges in this graph denote that the output of one task
is needed to accomplish the next.
Example: Database Query Processing
Note that the same problem can be decomposed into subtasks in other
ways as well.
An alternate decomposition of the given problem into
subtasks, along with their data dependencies.
Different task decompositions may lead to significant differences with
respect to their eventual parallel performance.
Degree of Concurrency
• The number of tasks that can be executed in parallel is the
degree of concurrency of a decomposition.
• Since the number of tasks that can be executed in parallel
may change over program execution, the maximum degree
of concurrency is the maximum number of such tasks at any
point during execution. What is the maximum degree of
concurrency of the database query examples?
• The average degree of concurrency is the average number
of tasks that can be processed in parallel over the execution
of the program. Assuming that each tasks in the database
example takes identical processing time, what is the
average degree of concurrency in each decomposition?
• The degree of concurrency increases as the decomposition
becomes finer in granularity and vice versa.
Critical Path Length
• A directed path in the task dependency graph
represents a sequence of tasks that must be
processed one after the other.
• The longest such path determines the shortest
time in which the program can be executed in
parallel.
• The length of the longest path in a task
dependency graph is called the critical path
length.
Limits on Parallel Performance
• It would appear that the parallel time can be
made arbitrarily small by making the
decomposition finer in granularity.
• There is an inherent bound on how fine the
granularity of a computation can be. For example,
in the case of multiplying a dense matrix with a
vector, there can be no more than (n 2 ) concurrent
tasks.
• Concurrent tasks may also have to exchange data
with other tasks. This results in communication
overhead. The tradeoff between the granularity
of a decomposition and associated overheads
often determines performance bounds.
Task Interaction Graphs
• Subtasks generally exchange data with others in a
decomposition. For example, even in the trivial
decomposition of the dense matrix-vector
product, if the vector is not replicated across all
tasks, they will have to communicate elements of
the vector.
• The graph of tasks (nodes) and their
interactions/data exchange (edges) is referred to
as a task interaction graph.
• Note that task interaction graphs represent data
dependencies, whereas task dependency graphs
represent control dependencies. Docsity.com
Task Interaction Graphs, Granularity,
and Communication
In general, if the granularity of a decomposition is finer, the associated overhead (as a ratio of useful work assocaited with a task) increases. Example: Consider the sparse matrix-vector product example from previous foil. Assume that each node takes unit time to process and each interaction (edge) causes an overhead of a unit time. Viewing node 0 as an independent task involves a useful computation of one time unit and overhead (communication) of three time units. Now, if we consider nodes 0, 4, and 8 as one task, then the task has useful computation totaling to three time units and communication corresponding to four time units (four edges). Clearly, this is a more favorable ratio than the former case.
Processes and Mapping
• In general, the number of tasks in a
decomposition exceeds the number of
processing elements available.
• For this reason, a parallel algorithm must also
provide a mapping of tasks to processes.
Note: We refer to the mapping as being from tasks to processes, as opposed to processors. This is because typical programming APIs, as we shall see, do not allow easy binding of tasks to physical processors. Rather, we aggregate tasks into processes and rely on the system to map these processes to physical processors. We use processes, not in the UNIX sense of a process, rather, simply as a collection of tasks and associated data.
Processes and Mapping
An appropriate mapping must minimize parallel execution
time by:
• Mapping independent tasks to different processes.
• Assigning tasks on critical path to processes as soon as they
become available.
• Minimizing interaction between processes by mapping
tasks with dense interactions to the same process.
Note: These criteria often conflict with each other. For
example, a decomposition into one task (or no
decomposition at all) minimizes interaction but does not
result in a speedup at all! Can you think of other such
conflicting cases?
Processes and Mapping: Example
Mapping tasks in the database query
decomposition to processes. These mappings
were arrived at by viewing the dependency graph
in terms of levels (no two nodes in a level have
dependencies). Tasks within a single level are
then assigned to different processes. Docsity.com