Task Partitioning - Parallel Processing - Lecture Slides, Slides of Parallel Computing and Programming

Some concept of Parallel Processing are Anatomy, Cache Access Time, Instruction Formats, Instruction Formats, Instruction Formats, Multidimensional Meshes, Network Processors, Snooping Protocol. Main points of this lecture are: Task Partitioning, Mapping to Processes, Parallel Algorithms, Tasks and Decomposition, Processes and Mapping, Processes Versus Processors, Decomposition Techniques, Recursive Decomposition, Exploratory Decomposition, Hybrid Decomposition

Typology: Slides

2012/2013

Uploaded on 04/30/2013

devank
devank 🇮🇳

4.3

(12)

152 documents

1 / 88

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 7: Task Partitioning and
Mapping to Processes
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58

Partial preview of the text

Download Task Partitioning - Parallel Processing - Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Lecture 7: Task Partitioning and

Mapping to Processes

Overview: Algorithms and

Concurrency

  • Introduction to Parallel Algorithms
    • Tasks and Decomposition
    • Processes and Mapping
    • Processes Versus Processors
  • Decomposition Techniques
    • Recursive Decomposition
    • Recursive Decomposition
    • Exploratory Decomposition
    • Hybrid Decomposition
  • Characteristics of Tasks and Interactions
    • Task Generation, Granularity, and Context
    • Characteristics of Task Interactions.

Preliminaries: Decomposition, Tasks,

  • The first step in developing a parallel algorithm is toand Dependency Graphs

decompose the problem into tasks that can be executed concurrently

  • A given problem may be decomposed into tasks in

many different ways.

  • Tasks may be of same, different, or even indeterminate

sizes.

  • A decomposition can be illustrated in the form of a

directed graph with nodes corresponding to tasks and edges indicating that the result of one task is required for processing the next. Such a graph is called a task dependency graph.

Example: Multiplying a Dense Matrix with a Vector

Computation of each element of output vector y is independent of other elements. Based on this, a dense matrix-vector product can be decomposed into n tasks. The figure highlights the portion of the matrix and vector accessed by Task 1.

Observations: While tasks share data (namely, the vector b ), they do not have any control dependencies - i.e., no task needs to wait for the (partial) completion of any other. All tasks are of the same size in terms of number of operations. Is this the maximum number of tasks we could decompose this problem into?

Example: Database Query Processing

The execution of the query can be divided into subtasks in various

ways. Each task can be thought of as generating an intermediate

table of entries that satisfy a particular clause.

Decomposing the given query into a number of tasks.
Edges in this graph denote that the output of one task
is needed to accomplish the next.

Example: Database Query Processing

Note that the same problem can be decomposed into subtasks in other
ways as well.
An alternate decomposition of the given problem into
subtasks, along with their data dependencies.
Different task decompositions may lead to significant differences with
respect to their eventual parallel performance.

Degree of Concurrency

• The number of tasks that can be executed in parallel is the

degree of concurrency of a decomposition.

• Since the number of tasks that can be executed in parallel

may change over program execution, the maximum degree

of concurrency is the maximum number of such tasks at any

point during execution. What is the maximum degree of

concurrency of the database query examples?

• The average degree of concurrency is the average number

of tasks that can be processed in parallel over the execution

of the program. Assuming that each tasks in the database

example takes identical processing time, what is the

average degree of concurrency in each decomposition?

• The degree of concurrency increases as the decomposition

becomes finer in granularity and vice versa.

Critical Path Length

• A directed path in the task dependency graph

represents a sequence of tasks that must be

processed one after the other.

• The longest such path determines the shortest

time in which the program can be executed in

parallel.

• The length of the longest path in a task

dependency graph is called the critical path

length.

Limits on Parallel Performance

• It would appear that the parallel time can be

made arbitrarily small by making the

decomposition finer in granularity.

• There is an inherent bound on how fine the

granularity of a computation can be. For example,

in the case of multiplying a dense matrix with a

vector, there can be no more than (n 2 ) concurrent

tasks.

• Concurrent tasks may also have to exchange data

with other tasks. This results in communication

overhead. The tradeoff between the granularity

of a decomposition and associated overheads

often determines performance bounds.

Task Interaction Graphs

• Subtasks generally exchange data with others in a

decomposition. For example, even in the trivial

decomposition of the dense matrix-vector

product, if the vector is not replicated across all

tasks, they will have to communicate elements of

the vector.

• The graph of tasks (nodes) and their

interactions/data exchange (edges) is referred to

as a task interaction graph.

• Note that task interaction graphs represent data

dependencies, whereas task dependency graphs

represent control dependencies. Docsity.com

Task Interaction Graphs, Granularity,

and Communication

In general, if the granularity of a decomposition is finer, the associated overhead (as a ratio of useful work assocaited with a task) increases. Example: Consider the sparse matrix-vector product example from previous foil. Assume that each node takes unit time to process and each interaction (edge) causes an overhead of a unit time. Viewing node 0 as an independent task involves a useful computation of one time unit and overhead (communication) of three time units. Now, if we consider nodes 0, 4, and 8 as one task, then the task has useful computation totaling to three time units and communication corresponding to four time units (four edges). Clearly, this is a more favorable ratio than the former case.

Processes and Mapping

• In general, the number of tasks in a

decomposition exceeds the number of

processing elements available.

• For this reason, a parallel algorithm must also

provide a mapping of tasks to processes.

Note: We refer to the mapping as being from tasks to processes, as opposed to processors. This is because typical programming APIs, as we shall see, do not allow easy binding of tasks to physical processors. Rather, we aggregate tasks into processes and rely on the system to map these processes to physical processors. We use processes, not in the UNIX sense of a process, rather, simply as a collection of tasks and associated data.

Processes and Mapping

An appropriate mapping must minimize parallel execution

time by:

• Mapping independent tasks to different processes.

• Assigning tasks on critical path to processes as soon as they

become available.

• Minimizing interaction between processes by mapping

tasks with dense interactions to the same process.

Note: These criteria often conflict with each other. For

example, a decomposition into one task (or no

decomposition at all) minimizes interaction but does not

result in a speedup at all! Can you think of other such

conflicting cases?

Processes and Mapping: Example

Mapping tasks in the database query

decomposition to processes. These mappings

were arrived at by viewing the dependency graph

in terms of levels (no two nodes in a level have

dependencies). Tasks within a single level are

then assigned to different processes. Docsity.com