Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Performance Metrics - Parallel Processing - Lecture Slides, Slides of Parallel Computing and Programming

Aliah University Parallel Computing and Programming

Some concept of Parallel Processing are Anatomy, Cache Access Time, Instruction Formats, Instruction Formats, Instruction Formats, Multidimensional Meshes, Network Processors, Snooping Protocol. Main points of this lecture are: Performance Metrics, Sources of Overhead in Parallel Programs, Performance Metrics For Parallel Systems, Effect of Granularity on Performance, Scalability of Parallel Systems, Minimum Execution, Time and Minimum, Execution Time, Asymptotic Analysis of Parallel Programs, S

Typology: Slides

2012/2013

Uploaded on 04/30/2013

devank 🇮🇳

4.3

(12)

152 documents

1 / 56

This page cannot be seen from the preview

Don't miss anything!

Lecture 10: Performance Metrics

Docsity.com

Discover Slides of Parallel Computing and Programming Aliah University

Partial preview of the text

Download Performance Metrics - Parallel Processing - Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Lecture 10: Performance Metrics

Topic Overview

Sources of Overhead in Parallel Programs
Performance Metrics for Parallel Systems
Effect of Granularity on Performance
Scalability of Parallel Systems
Minimum Execution Time and Minimum Cost-Optimal Execution Time
Asymptotic Analysis of Parallel Programs
Other Scalability Metrics

Analytical Modeling - Basics

A number of performance measures are intuitive.
Wall clock time - the time from the start of the first processor to the stopping time of the last processor in a parallel ensemble. But how does this scale when the number of processors is changed of the program is ported to another machine altogether?
How much faster is the parallel version? This begs the obvious followup question - whats the baseline serial version with which we compare? Can we use a suboptimal serial program to make our parallel program look
Raw FLOP count - What good are FLOP counts when they dont solve a problem?

Sources of Overhead in Parallel Programs

If I use two processors, shouldnt my program run twice as fast?
No - a number of overheads, including wasted computation, communication, idling, and contention cause degradation in performance.

The execution profile of a hypothetical parallel program executing on eight processing elements. Profile indicates times spent performing computation (both essential and excess), communication, and idling.

Performance Metrics for Parallel

Systems: Execution

Time

Serial runtime of a program is the time

elapsed between the beginning and the end

of its execution on a sequential computer.

The parallel runtime is the time that elapses

from the moment the first processor starts to

the moment the last processor finishes

execution.

We denote the serial runtime by and the

parallel runtime by T P.

Performance Metrics for Parallel

Systems: Total Parallel Overhead

Let Tall be the total time collectively spent by all the processing elements.
TS is the serial time.
Observe that Tall - TS is then the total time spend by all processors combined in non-useful work. This is called the total overhead.
The total time collectively spent by all the processing elements Tall = p TP ( p is the number of processors).
The overhead function ( To ) is therefore given by

To = p TP - TS (1)

Performance Metrics: Example

Consider the problem of adding n numbers by

using n processing elements.

If n is a power of two, we can perform this

operation in log n steps by propagating partial

sums up a logical binary tree of processors.

Performance Metrics: Example

Computing the globalsum of 16 partial sums using

16 processing elements. Σji denotes the sum of

numbers with consecutive labels from i to j.

Performance Metrics: Speedup

For a given problem, there might be many serial

algorithms available. These algorithms may have

different asymptotic runtimes and may be

parallelizable to different degrees

For the purpose of computing speedup, we

always consider the best sequential program as

the baseline

For the purpose of determining how effective our

parallelization technique is, we can determine a

pseudo-speedup w.r.t. the sequential version of

the parallel algorithm Docsity.com

Performance Metrics: Speedup

Consider the problem of parallel bubble sort.Example
The serial time for bubblesort is 150 seconds.
The parallel time for odd-even sort (efficient parallelization of bubble sort) is 40 seconds.
The speedup would appear to be 150/40 = 3.75. This is actually a pseudo-speedup
But is this really a fair assessment of the system?
What if serial quicksort only took 30 seconds? In this case, the speedup is 30/40 = 0.75. This is a more realistic assessment of the system.

Performance Metrics: Superlinear Speedups

One reason for superlinearity is that the parallel

version does less work than corresponding serial algorithm.

Searching an unstructured tree for a node with a given label, `S', on two processing elements using depth-first traversal. The two-processor version with processor 0 searching the left subtree and processor 1 searching the right subtree expands only the shaded nodes before the solution is found. The corresponding serial formulation expands the entire tree. It is clear that the serial algorithm does more work than the parallel algorithm.

Performance Metrics - Parallel Processing - Lecture Slides, Slides of Parallel Computing and Programming

Related documents

Partial preview of the text

Download Performance Metrics - Parallel Processing - Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Lecture 10: Performance Metrics

Topic Overview

Analytical Modeling - Basics

Sources of Overhead in Parallel Programs

elapsed between the beginning and the end

of its execution on a sequential computer.

from the moment the first processor starts to

the moment the last processor finishes

execution.

parallel runtime by T P.

Performance Metrics: Example

using n processing elements.

operation in log n steps by propagating partial

sums up a logical binary tree of processors.

Performance Metrics: Example

Performance Metrics: Speedup

Performance Metrics: Superlinear Speedups

Performance Metrics: Superlinear Speedups

Example: A processor with 64KB of cache

yields an 80% hit ratio. If two processors are

used, since the problem size/processor is

smaller, the hit ratio goes up to 90%. Of the

remaining 10% access, 8% come from local

memory and 2% from remote memory.

Performance Metrics: Efficiency

for which a processing element is usefully

employed

can be as low as 0 and as high as 1. Docsity.com