


























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This lecture was delivered by Dr. Hanif Durad at Pakistan Institute of Engineering and Applied Sciences, Islamabad (PIEAS) for Parallel Computing course. it includes: Performance, Metrics, Parallel, Programs, Timing, Wall, User, CPU, Runtime, MPI, Platform, Independent
Typology: Slides
1 / 34
This page cannot be seen from the preview
Don't miss anything!



























Dr. Hanif Durad
Timing wall time user time system time Measuring time Using gprof program PC-2.pdf
In order to parallelize a program/algorithm, weneed to know which parts of a program need themost computation time. Three different time spans to be considered: wall time user time system time Dr. Hanif Durad
The actual runtime used by the program. User time << the wall time the program has to wait a lot, for example forcomputation time allocation or data from the RAMor from the hard-disk. These are indications for necessary optimizations. When using more than one CPU, the user timeshould be higher than the wall time, indicating thatthe CPUs work in parallel.
Time used not by the program itself, but by theoperating system, e.g. for allocating memory orhard disk access. System time should stay low. Dr. Hanif Durad
For the performance analysis, we want to know the runtime required by individual parts of a program. There are several programming language and operatingsystem dependent methods for measuring time inside aprogram. MPI & OpenMP have their own, platform independentfunctions for time measurement. MPI_Wtime() & omp_get_wtime() return the wall time in secs, the difference between the results of two such function callsyields the runtime elapsed between the two function calls.
advanced method of performance analysis: profiling the program has to be built with information for theprofiler. Example: done with the switch -p for Intel Fortran at run, the program creates the file gmon.out required by the profiler gprof gprof program > prof.txt creates a text file with the profilinginformation. flat profile lists all function/subroutine calls, time used for them,percentage of the total time, no. of calls etc call tree, a listing of all routines call by the subroutines of the program
Dr. Hanif Durad
Dr. Hanif Durad
Parallel Execution Time Parallel Cost Overheads, Sources Of Overhead Speedup, Efficiency Amdahl’s Law, Scalability Granularity, Coupling Analysis.ppt
o
Overhead: T o
s Where does it come from? idling^ not enough parallelism load imbalance communication additional and/or repeated calculations Dr. Hanif Durad Gramma, P-
Speedup:
s
p , where T s is the best sequential time Efficiency: E = S/p = T s /pT p
s
s
o
s
Dr. Hanif Durad
Consequence of Amdahl’s law: for a given instance, adding additional processors gives diminishing returns only relatively few processors can be efficiently used Way around: increase the problem size sequential part tends to grow slower then the parallel part A system is scalable if efficiency can be maintained byincreasing problem size Dr. Hanif Durad
Dr. Hanif Durad The size of the computation segments between communication. fine grained coarse grained ILP loop parallelism task parallelism
Typified by long computations consisting of large numbers ofinstructions between communication synchronization points High computation to communication ratio Lower communication overhead Harder to load balance efficiently P0 P computation P 2 commmunication P 3 P
The most efficient granularity is dependent on thealgorithm and the hardware environment inwhich it runs In most cases overhead associated withcommunications and synchronization is highrelative to execution speed so it is advantageousto have coarse granularity. Dr. Hanif Durad