Performance Metrics for Parallel Programs-Parallel Computing-Lecture Slides, Slides of Parallel Computing and Programming

This lecture was delivered by Dr. Hanif Durad at Pakistan Institute of Engineering and Applied Sciences, Islamabad (PIEAS) for Parallel Computing course. it includes: Performance, Metrics, Parallel, Programs, Timing, Wall, User, CPU, Runtime, MPI, Platform, Independent

Typology: Slides

2011/2012

Uploaded on 07/19/2012

adnaan
adnaan 🇵🇰

4

(1)

13 documents

1 / 34

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Dr. Hanif Durad 2
Lecture Outline-Part1
Timing
wall time
user time
system time
Measuring time
Using gprof program
PC-2.pdf
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22

Partial preview of the text

Download Performance Metrics for Parallel Programs-Parallel Computing-Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!

Dr. Hanif Durad

Lecture Outline-Part

 Timing  wall time  user time  system time  Measuring time  Using gprof program PC-2.pdf

Timing

 In order to parallelize a program/algorithm, weneed to know which parts of a program need themost computation time.  Three different time spans to be considered:  wall time  user time  system time Dr. Hanif Durad

User Time

 The actual runtime used by the program.  User time << the wall time  the program has to wait a lot, for example forcomputation time allocation or data from the RAMor from the hard-disk.  These are indications for necessary optimizations.  When using more than one CPU, the user timeshould be higher than the wall time, indicating thatthe CPUs work in parallel.

System Time

 Time used not by the program itself, but by theoperating system, e.g. for allocating memory orhard disk access.  System time should stay low. Dr. Hanif Durad

Measuring time (2/3)

 For the performance analysis, we want to know the runtime required by individual parts of a program.  There are several programming language and operatingsystem dependent methods for measuring time inside aprogram.  MPI & OpenMP have their own, platform independentfunctions for time measurement.  MPI_Wtime() & omp_get_wtime() return the wall time in secs, the difference between the results of two such function callsyields the runtime elapsed between the two function calls.

Measuring time (3/3)

 advanced method of performance analysis: profiling  the program has to be built with information for theprofiler.  Example:  done with the switch -p for Intel Fortran  at run, the program creates the file gmon.out required by the profiler gprof  gprof program > prof.txt creates a text file with the profilinginformation.  flat profile lists all function/subroutine calls, time used for them,percentage of the total time, no. of calls etc  call tree, a listing of all routines call by the subroutines of the program

Analytical modelingof parallel programs

Dr. Hanif Durad

Dr. Hanif Durad

Lecture Outline- -Part 2Modeling of Parallel Programs

 Parallel Execution Time  Parallel Cost  Overheads, Sources Of Overhead  Speedup, Efficiency  Amdahl’s Law, Scalability  Granularity, Coupling Analysis.ppt

Overhead (T

o

 Overhead: T o

=C-T

s  Where does it come from?  idling^  not enough parallelism  load imbalance  communication  additional and/or repeated calculations Dr. Hanif Durad Gramma, P-

Other Measures

 Speedup: 

S=T

s

/T

p , where T s is the best sequential time  Efficiency:  E = S/p = T s /pT p

= T

s

/C = T

s

/ (T

o

+T

s

Dr. Hanif Durad

Scalability of Parallel Systems(2/2)

 Consequence of Amdahl’s law:  for a given instance, adding additional processors gives diminishing returns  only relatively few processors can be efficiently used  Way around:  increase the problem size  sequential part tends to grow slower then the parallel part  A system is scalable if efficiency can be maintained byincreasing problem size Dr. Hanif Durad

Granularity

Dr. Hanif Durad The size of the computation segments between communication. fine grained coarse grained ILP loop parallelism task parallelism

Fine Grain Parallelism

 Typified by long computations consisting of large numbers ofinstructions between communication synchronization points  High computation to communication ratio  Lower communication overhead  Harder to load balance efficiently P0 P computation P 2 commmunication P 3 P

Granularity

 The most efficient granularity is dependent on thealgorithm and the hardware environment inwhich it runs  In most cases overhead associated withcommunications and synchronization is highrelative to execution speed so it is advantageousto have coarse granularity. Dr. Hanif Durad