Performance Metrics - Parallel Processing - Lecture Slides, Slides of Parallel Computing and Programming

Some concept of Parallel Processing are Anatomy, Cache Access Time, Instruction Formats, Instruction Formats, Instruction Formats, Multidimensional Meshes, Network Processors, Snooping Protocol. Main points of this lecture are: Performance Metrics, Sources of Overhead in Parallel Programs, Performance Metrics For Parallel Systems, Effect of Granularity on Performance, Scalability of Parallel Systems, Minimum Execution, Time and Minimum, Execution Time, Asymptotic Analysis of Parallel Programs, S

Typology: Slides

2012/2013

Uploaded on 04/30/2013

devank
devank 🇮🇳

4.3

(12)

152 documents

1 / 56

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Lecture 10: Performance Metrics
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38

Partial preview of the text

Download Performance Metrics - Parallel Processing - Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Lecture 10: Performance Metrics

Topic Overview

  • Sources of Overhead in Parallel Programs
  • Performance Metrics for Parallel Systems
  • Effect of Granularity on Performance
  • Scalability of Parallel Systems
  • Minimum Execution Time and Minimum Cost-Optimal Execution Time
  • Asymptotic Analysis of Parallel Programs
  • Other Scalability Metrics

Analytical Modeling - Basics

  • A number of performance measures are intuitive.
  • Wall clock time - the time from the start of the first processor to the stopping time of the last processor in a parallel ensemble. But how does this scale when the number of processors is changed of the program is ported to another machine altogether?
  • How much faster is the parallel version? This begs the obvious followup question - whats the baseline serial version with which we compare? Can we use a suboptimal serial program to make our parallel program look
  • Raw FLOP count - What good are FLOP counts when they dont solve a problem?

Sources of Overhead in Parallel Programs

  • If I use two processors, shouldnt my program run twice as fast?
  • No - a number of overheads, including wasted computation, communication, idling, and contention cause degradation in performance.

The execution profile of a hypothetical parallel program executing on eight processing elements. Profile indicates times spent performing computation (both essential and excess), communication, and idling.

Performance Metrics for Parallel

Systems: Execution

Time

  • Serial runtime of a program is the time

elapsed between the beginning and the end

of its execution on a sequential computer.

  • The parallel runtime is the time that elapses

from the moment the first processor starts to

the moment the last processor finishes

execution.

  • We denote the serial runtime by and the

parallel runtime by T P.

Performance Metrics for Parallel

Systems: Total Parallel Overhead

  • Let Tall be the total time collectively spent by all the processing elements.
  • TS is the serial time.
  • Observe that Tall - TS is then the total time spend by all processors combined in non-useful work. This is called the total overhead.
  • The total time collectively spent by all the processing elements Tall = p TP ( p is the number of processors).
  • The overhead function ( To ) is therefore given by

To = p TP - TS (1)

Performance Metrics: Example

  • Consider the problem of adding n numbers by

using n processing elements.

  • If n is a power of two, we can perform this

operation in log n steps by propagating partial

sums up a logical binary tree of processors.

Performance Metrics: Example

Computing the globalsum of 16 partial sums using

16 processing elements. Σji denotes the sum of

numbers with consecutive labels from i to j.

Performance Metrics: Speedup

  • For a given problem, there might be many serial

algorithms available. These algorithms may have

different asymptotic runtimes and may be

parallelizable to different degrees

  • For the purpose of computing speedup, we

always consider the best sequential program as

the baseline

  • For the purpose of determining how effective our

parallelization technique is, we can determine a

pseudo-speedup w.r.t. the sequential version of

the parallel algorithm Docsity.com

Performance Metrics: Speedup

  • Consider the problem of parallel bubble sort.Example
  • The serial time for bubblesort is 150 seconds.
  • The parallel time for odd-even sort (efficient parallelization of bubble sort) is 40 seconds.
  • The speedup would appear to be 150/40 = 3.75. This is actually a pseudo-speedup
  • But is this really a fair assessment of the system?
  • What if serial quicksort only took 30 seconds? In this case, the speedup is 30/40 = 0.75. This is a more realistic assessment of the system.

Performance Metrics: Superlinear Speedups

One reason for superlinearity is that the parallel

version does less work than corresponding serial algorithm.

Searching an unstructured tree for a node with a given label, `S', on two processing elements using depth-first traversal. The two-processor version with processor 0 searching the left subtree and processor 1 searching the right subtree expands only the shaded nodes before the solution is found. The corresponding serial formulation expands the entire tree. It is clear that the serial algorithm does more work than the parallel algorithm.

Performance Metrics: Superlinear Speedups

Performance Metrics: Superlinear Speedups

Example: A processor with 64KB of cache

yields an 80% hit ratio. If two processors are

used, since the problem size/processor is

smaller, the hit ratio goes up to 90%. Of the

remaining 10% access, 8% come from local

memory and 2% from remote memory.

Performance Metrics: Efficiency

  • Efficiency is a measure of the fraction of time

for which a processing element is usefully

employed

  • Mathematically, it is given by

= (2)

  • Following the bounds on speedup, efficiency

can be as low as 0 and as high as 1. Docsity.com