Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Performance Analysis of Parallel Systems: Metrics, Overheads, and Isoefficiency, Slides of Parallel Computing and Programming

Pakistan Institute of Engineering and Applied Sciences, Islamabad (PIEAS)Parallel Computing and Programming

An overview of performance metrics for parallel systems, focusing on the tradeoff between granularity and performance, scalability, and the concept of asymptotic isoefficiency. It covers topics such as measuring program performance, asymptotic execution time, speedup, and parallel efficiency. The document also discusses the impact of non-cost optimality and the effect of granularity on performance.

Typology: Slides

2011/2012

Uploaded on 07/19/2012

adnaan 🇵🇰

(1)

13 documents

1 / 36

This page cannot be seen from the preview

Don't miss anything!

Topic Overview

•Performance metrics for parallel systems

•Tradeoff: granularity vs. performance

•Scalability of parallel systems

•Introduction to asymptotic isoefficiency

docsity.com

Discover Slides of Parallel Computing and Programming Pakistan Institute of Engineering and Applied Sciences, Islamabad (PIEAS)

Partial preview of the text

Download Performance Analysis of Parallel Systems: Metrics, Overheads, and Isoefficiency and more Slides Parallel Computing and Programming in PDF only on Docsity!

Topic Overview

Performance metrics for parallel systems
Tradeoff: granularity vs. performance
Scalability of parallel systems
Introduction to asymptotic isoefficiency

Measuring Program Performance

Wall clock time —start time of the first process to end time of last process —how does this scale? - when the number of processors is changed - when the program is ported to another machine
Operation counts, e.g. FLOPs —are these useful?
How much faster is the parallel version? —what do we compare with?

Performance Metrics: Execution Time

Serial time: TS —time elapsed between the start and end of serial execution
Parallel time: Tp —time elapsed between first process start and last process end

Performance Metrics: Total Parallel Overhead

Tall = —∑^ time spent collectively by each processor - p Tp ,^ where^ p^ is the number of processors
Total parallel overhead:^ To —time wasted by all processors combined — To = Tall - TS — To = pTP - TS

Performance Metrics: Speedup

S = TS / TP

Example

Add^ n^ numbers using^ n^ processing elements
If^ n^ is a power of two —can perform this operation in log n steps on n processors - propagate partial sums up a logical binary tree

Performance analysis

TS =^ Θ^ ( n )
Assumptions —addition takes constant time^ tc —communication of a single word takes time^ ts +^ tw
TP =^ Θ^ ( log^ n )
Speedup^ S^ =^ Θ^ ( n^ /^ log^ n )

A Note About T

s

Might be many serial algorithms for a problem
Different algorithms may have different asymptotic runtimes
May be parallelizable to different degrees

Speedup of Odd-Even Parallel Sort

Serial time for bubblesort: 150 seconds
Odd-even parallel sort: 40 seconds
Apparent speedup = 150/40 = 3. —is this a fair assessment?

Should consider the best serial

program as the baseline for T

What if serial quicksort only took 30 seconds?
Speedup of odd-even sort over quicksort = 30/40 = 0. —fairer assessment

Performance Metrics: Speedup Bounds

Parallel program never terminates: speedup = 0
Speedup >^ p? —theoretically^ only if each processor spends less than time^ TS /^ p —but then single processor could be time sliced for < T s - contradicts assumption of minimal T s —in practice: yes!

Parallel Efficiency

- Fraction of time a processor performs useful work

E = S / p = TS / (p TP) - Bounds —theoretically: 0 ≤ E ≤ 1 —in practice: can have efficiency > 1 if superlinear speedup - Previous example: add^ n^ numbers using^ n^ PEs —speedup S = Θ ( n / log n ) —efficiency E = S / n = Θ (n / log n) / n = Θ (1 / log n)

Example: Edge Detection

Operation uses a^^3 x^^3 template to compute each pixel value
Serial time for an^ n^ x^ n^ image is given by^ TS =^9 tcn 2
Possible parallelization — partition image equally into vertical slabs, each with n^2 / p pixels — boundary of each slab is 2n pixels - number of pixel values that will have to be communicated — communication time =^^2 ( ts +^ twn )
Apply template to all^ n 2 / p pixels in time TS = 9 tcn 2 / p

Cost Optimality

- Cost of parallel system =^ pTP —sum of the work time for each processor — AKA work or processor-time product - Parallel system is^ cost-optimal^ if —O(solving a problem on a parallel computer) = O(serial) - Since^ E^ =^ TS /^ pTP , for cost optimal systems^ E^ =^ O (1)

Considering Cost Optimality

Problem revisited: add n numbers

• Is it cost-optimal on a parallel system using^ n^ PEs?

• As before,^ TP =^ log^ n^ for^ p^ =^ n

• Cost of this system =^ p^ TP =^ Θ( n^ log^ n )

• Serial runtime =^ Θ( n )

• Algorithm is not cost optimal

- E =^ Θ (n / (n log n)) =^ Θ (1 / log n)

Effect of Granularity on Performance

Scaling down a parallel system —using fewer processors than the maximum possible —usually improves parallel system efficiency —naïve scaling down - consider each original processor as virtual processor - map virtual processors to scaled-down number of processors
Impact —# PE decreases by a factor of n / p —computation for each PE increases by a factor of n / p —communication cost should decrease as well - VPs assigned to a physical processor might communicate

Building Granularity: Sum Example

Add^ n^ numbers on^ p^ processing elements — p < n — n and p are powers of 2
Use parallel algorithm for^ n^ (virtual) processors —assign each processor n / p virtual processors

Performance Analysis of Parallel Systems: Metrics, Overheads, and Isoefficiency, Slides of Parallel Computing and Programming

Related documents

Partial preview of the text

Download Performance Analysis of Parallel Systems: Metrics, Overheads, and Isoefficiency and more Slides Parallel Computing and Programming in PDF only on Docsity!

Topic Overview

Measuring Program Performance

Performance Metrics: Execution Time

Performance Metrics: Total Parallel Overhead

Performance Metrics: Speedup

Example

Performance analysis

A Note About T

s

Speedup of Odd-Even Parallel Sort

Should consider the best serial

program as the baseline for T

Performance Metrics: Speedup Bounds

Parallel Efficiency

Example: Edge Detection

Cost Optimality

Considering Cost Optimality

Problem revisited: add n numbers

• Is it cost-optimal on a parallel system using^ n^ PEs?

• As before,^ TP =^ log^ n^ for^ p^ =^ n

• Cost of this system =^ p^ TP =^ Θ( n^ log^ n )

• Serial runtime =^ Θ( n )

• Algorithm is not cost optimal

- E =^ Θ (n / (n log n)) =^ Θ (1 / log n)

Effect of Granularity on Performance

Building Granularity: Sum Example