Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Task Partitioning - Parallel Processing - Lecture Slides, Slides of Parallel Computing and Programming

Aliah University Parallel Computing and Programming

Some concept of Parallel Processing are Anatomy, Cache Access Time, Instruction Formats, Instruction Formats, Instruction Formats, Multidimensional Meshes, Network Processors, Snooping Protocol. Main points of this lecture are: Task Partitioning, Mapping to Processes, Parallel Algorithms, Tasks and Decomposition, Processes and Mapping, Processes Versus Processors, Decomposition Techniques, Recursive Decomposition, Exploratory Decomposition, Hybrid Decomposition

Typology: Slides

2012/2013

Uploaded on 04/30/2013

devank 🇮🇳

4.3

(12)

152 documents

1 / 88

This page cannot be seen from the preview

Don't miss anything!

Lecture 7: Task Partitioning and

Mapping to Processes

Docsity.com

Discover Slides of Parallel Computing and Programming Aliah University

Partial preview of the text

Download Task Partitioning - Parallel Processing - Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Lecture 7: Task Partitioning and

Mapping to Processes

Overview: Algorithms and

Concurrency

Introduction to Parallel Algorithms
- Tasks and Decomposition
- Processes and Mapping
- Processes Versus Processors
Decomposition Techniques
- Recursive Decomposition
- Recursive Decomposition
- Exploratory Decomposition
- Hybrid Decomposition
Characteristics of Tasks and Interactions
- Task Generation, Granularity, and Context
- Characteristics of Task Interactions.

Preliminaries: Decomposition, Tasks,

The first step in developing a parallel algorithm is toand Dependency Graphs

decompose the problem into tasks that can be executed concurrently

A given problem may be decomposed into tasks in

many different ways.

Tasks may be of same, different, or even indeterminate

sizes.

A decomposition can be illustrated in the form of a

directed graph with nodes corresponding to tasks and edges indicating that the result of one task is required for processing the next. Such a graph is called a task dependency graph.

Example: Multiplying a Dense Matrix with a Vector

Computation of each element of output vector y is independent of other elements. Based on this, a dense matrix-vector product can be decomposed into n tasks. The figure highlights the portion of the matrix and vector accessed by Task 1.

Observations: While tasks share data (namely, the vector b ), they do not have any control dependencies - i.e., no task needs to wait for the (partial) completion of any other. All tasks are of the same size in terms of number of operations. Is this the maximum number of tasks we could decompose this problem into?

Example: Database Query Processing

The execution of the query can be divided into subtasks in various

ways. Each task can be thought of as generating an intermediate

table of entries that satisfy a particular clause.

Decomposing the given query into a number of tasks.

Edges in this graph denote that the output of one task

is needed to accomplish the next.

Example: Database Query Processing

Note that the same problem can be decomposed into subtasks in other

ways as well.

An alternate decomposition of the given problem into

subtasks, along with their data dependencies.

Different task decompositions may lead to significant differences with

respect to their eventual parallel performance.

Degree of Concurrency

• The number of tasks that can be executed in parallel is the

degree of concurrency of a decomposition.

• Since the number of tasks that can be executed in parallel

may change over program execution, the maximum degree

of concurrency is the maximum number of such tasks at any

point during execution. What is the maximum degree of

concurrency of the database query examples?

• The average degree of concurrency is the average number

of tasks that can be processed in parallel over the execution

of the program. Assuming that each tasks in the database

example takes identical processing time, what is the

average degree of concurrency in each decomposition?

• The degree of concurrency increases as the decomposition

becomes finer in granularity and vice versa.

Critical Path Length

• A directed path in the task dependency graph

represents a sequence of tasks that must be

processed one after the other.

• The longest such path determines the shortest

time in which the program can be executed in

parallel.

• The length of the longest path in a task

dependency graph is called the critical path

length.

Limits on Parallel Performance

• It would appear that the parallel time can be

made arbitrarily small by making the

decomposition finer in granularity.

• There is an inherent bound on how fine the

granularity of a computation can be. For example,

in the case of multiplying a dense matrix with a

vector, there can be no more than (n 2 ) concurrent

tasks.

• Concurrent tasks may also have to exchange data

with other tasks. This results in communication

overhead. The tradeoff between the granularity

of a decomposition and associated overheads

often determines performance bounds.

Task Interaction Graphs

• Subtasks generally exchange data with others in a

decomposition. For example, even in the trivial

decomposition of the dense matrix-vector

product, if the vector is not replicated across all

tasks, they will have to communicate elements of

the vector.

• The graph of tasks (nodes) and their

interactions/data exchange (edges) is referred to

as a task interaction graph.

• Note that task interaction graphs represent data

dependencies, whereas task dependency graphs

represent control dependencies. Docsity.com

Task Interaction Graphs, Granularity,

and Communication

In general, if the granularity of a decomposition is finer, the associated overhead (as a ratio of useful work assocaited with a task) increases. Example: Consider the sparse matrix-vector product example from previous foil. Assume that each node takes unit time to process and each interaction (edge) causes an overhead of a unit time. Viewing node 0 as an independent task involves a useful computation of one time unit and overhead (communication) of three time units. Now, if we consider nodes 0, 4, and 8 as one task, then the task has useful computation totaling to three time units and communication corresponding to four time units (four edges). Clearly, this is a more favorable ratio than the former case.

Processes and Mapping

• In general, the number of tasks in a

decomposition exceeds the number of

processing elements available.

• For this reason, a parallel algorithm must also

provide a mapping of tasks to processes.

Note: We refer to the mapping as being from tasks to processes, as opposed to processors. This is because typical programming APIs, as we shall see, do not allow easy binding of tasks to physical processors. Rather, we aggregate tasks into processes and rely on the system to map these processes to physical processors. We use processes, not in the UNIX sense of a process, rather, simply as a collection of tasks and associated data.

Task Partitioning - Parallel Processing - Lecture Slides, Slides of Parallel Computing and Programming

Related documents

Partial preview of the text

Download Task Partitioning - Parallel Processing - Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Lecture 7: Task Partitioning and

Mapping to Processes

Overview: Algorithms and

Concurrency

Preliminaries: Decomposition, Tasks,

Example: Database Query Processing

The execution of the query can be divided into subtasks in various

ways. Each task can be thought of as generating an intermediate

table of entries that satisfy a particular clause.

Decomposing the given query into a number of tasks.

Edges in this graph denote that the output of one task

is needed to accomplish the next.

Example: Database Query Processing

Note that the same problem can be decomposed into subtasks in other

ways as well.

An alternate decomposition of the given problem into

subtasks, along with their data dependencies.

Different task decompositions may lead to significant differences with

respect to their eventual parallel performance.

Degree of Concurrency

• The number of tasks that can be executed in parallel is the

degree of concurrency of a decomposition.

• Since the number of tasks that can be executed in parallel

may change over program execution, the maximum degree

of concurrency is the maximum number of such tasks at any

point during execution. What is the maximum degree of

concurrency of the database query examples?

• The average degree of concurrency is the average number

of tasks that can be processed in parallel over the execution

of the program. Assuming that each tasks in the database

example takes identical processing time, what is the

average degree of concurrency in each decomposition?

• The degree of concurrency increases as the decomposition

becomes finer in granularity and vice versa.

Critical Path Length

• A directed path in the task dependency graph

represents a sequence of tasks that must be

processed one after the other.

• The longest such path determines the shortest

time in which the program can be executed in

parallel.

• The length of the longest path in a task

dependency graph is called the critical path

length.

Limits on Parallel Performance

• It would appear that the parallel time can be

made arbitrarily small by making the

decomposition finer in granularity.

• There is an inherent bound on how fine the

granularity of a computation can be. For example,

in the case of multiplying a dense matrix with a

vector, there can be no more than (n 2 ) concurrent

tasks.

• Concurrent tasks may also have to exchange data

with other tasks. This results in communication

overhead. The tradeoff between the granularity

of a decomposition and associated overheads

often determines performance bounds.

Task Interaction Graphs

• Subtasks generally exchange data with others in a

decomposition. For example, even in the trivial

decomposition of the dense matrix-vector

product, if the vector is not replicated across all

tasks, they will have to communicate elements of

the vector.

• The graph of tasks (nodes) and their

interactions/data exchange (edges) is referred to

as a task interaction graph.

• Note that task interaction graphs represent data

dependencies, whereas task dependency graphs

represent control dependencies. Docsity.com

Task Interaction Graphs, Granularity,

and Communication

Processes and Mapping

• In general, the number of tasks in a

decomposition exceeds the number of