Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Measuring Machine Performance in Parallel Computing: Focus on CSE 160 Chien's Lecture, Assignments of Introduction to Sociology

University of California - San Diego Introduction to Sociology

A set of lecture slides from cse 160 chien's spring 2005 course on parallel computing. The slides cover the topic of benchmarks used for measuring machine performance, specifically in the context of parallel machines. The slides discuss various benchmarks such as the livermore loops, linpack, and nas parallel benchmarks, and their importance in measuring the performance of different aspects of a machine, including its clock rate, cpu structure, alus, caches, memory, networks, storage, compilers, and more. The slides also mention the use of benchmarks in the top 500 supercomputer sites list.

Typology: Assignments

Pre 2010

Uploaded on 03/28/2010

koofers-user-hua 🇺🇸

9 documents

1 / 11

This page cannot be seen from the preview

Don't miss anything!

Lecture #11, Slide 1

CSE 160 Chien, Spring 2005

Understanding and Measuring

Speedup

•Last Time

»Midterm Summary

»Definitions of Speedup

»Amdahl’s Law

»Gustafson’s Scaled and Fixed Time Speedup

•Today

»Benchmarks

»Measuring Parallelism

»Machine and Application examples

•Reminders/Announcements

»Homework #3 will be out tomorrow; Due date is Monday, May 16th

in Section

»Homework #2 grading is still in progress

Lecture #11, Slide 2

CSE 160 Chien, Spring 2005

Using programs to Measure

Machine Performance

•Speedup measures performance of an individual

program on a particular machine

»Speedup cannot be used to

– Compare different algorithms on the same computer

– Compare the same algorithm on different computers

•Benchmarks are representative programs which can

be used to compare performance of machines

»Integrate many aspects of the machines

– Clock rate, cpu structure, ALU’s, caches, memory, networks,

storage, compilers, etc.

Discover Assignments of Introduction to Sociology University of California - San Diego

Partial preview of the text

Download Measuring Machine Performance in Parallel Computing: Focus on CSE 160 Chien's Lecture and more Assignments Introduction to Sociology in PDF only on Docsity!

CSE 160 Chien, Spring 2005 Lecture #11, Slide 1

Understanding and Measuring

Speedup

Last Time » Midterm Summary » Definitions of Speedup » Amdahl’s Law » Gustafson’s Scaled and Fixed Time Speedup
Today » Benchmarks » Measuring Parallelism » Machine and Application examples
Reminders/Announcements » Homework #3 will be out tomorrow; Due date is Monday, May 16th in Section » Homework #2 grading is still in progress

Using programs to Measure

Machine Performance

Speedup measures performance of an individual program on a particular machine » Speedup cannot be used to - Compare different algorithms on the same computer - Compare the same algorithm on different computers
Benchmarks are representative programs which can be used to compare performance of machines » Integrate many aspects of the machines - Clock rate, cpu structure, ALU’s, caches, memory, networks, storage, compilers, etc.

CSE 160 Chien, Spring 2005 Lecture #11, Slide 3

Benchmarks used for Parallel

Machines

The Livermore Loops
The “PACKS” (Linpack, LAPACK, ScaLAPACK, etc.)
The NAS Parallel Benchmarks
The Perfect Club
ParkBENCH
SLALOM, HINT

The Livermore Loops

Set of 24 Fortran DO loops extracted from operational codes at LLNL » Each one is literally a single loop nest (multiple loops inside of each other) » Different structures of iteration (1D, 2D, 3D arrays) » Different structures of dependences (all parallel, forward with distance 1, forward with distance 2, forward with distance 8, etc.) » All different types of data reference (and therefore locality) structures
=> Originated the use of MFLOP/s for performance » Performance statistics reported: arithmetic, harmonic, geometric means, …
http://www.netlib.org/benchmark/livermore

CSE 160 Chien, Spring 2005 Lecture #11, Slide 7

LinPack

Linear Algebra routines available in both C and Fortran
Benchmarks solve a system of equations using Gaussian Elimination and report Mega, Giga, TeraFlops
Core of Linpack is subroutine ("saxpy" in the single-precision version, "daxpy" in the double-precision version) doing the inner loop for frequent matrix operations:

y(i) = y(i) + a * x(i)

Standard version operates on 100x100 matrices; there are also versions for sizes 300x300 and 1000x1000, with different optimization rules. Largest runs are 933,887 x 933,887 for recent record holder => Huge problems!
Originator: Jack Dongarra, Univ. of Tennessee
http://www.netlib.org/benchmark/hpl/

Optimizing Linpack

Linpack is easily vectorizable on many systems.
Easy to exploit a multiply-add operation
Some compilers have "daxpy recognizers" to substitute hand-optimized code!
Lots of special optimization for parallel machines

CSE 160 Chien, Spring 2005 Lecture #11, Slide 9

Top 500 List

Linpack is Basis of the “Top 500 supercomputer Sites” list » http://www.top500.org/
Overview of the market of high-performance systems
Twice a year list of the 500 most powerful installed systems » November and April each Year
Linpack benchmark used to rank systems
Annual report analyzes developments in HW, SW and the market

Want to see where your laptop

rates?

CSE 160 Chien, Spring 2005 Lecture #11, Slide 13

NAS Parallel Benchmarks

(NPB)

Benchmarks from CFD (computational fluid dynamics) codes » Fortran and C versions available. » NPB are kernels and compact pseudo-applications, not full applications
Algorithmic definition of each program and sequential implementation of each algorithm
Users write a set of tuned parallel applications
NPB are widely used
http://www.nas.nasa.gov/Software/NPB/

NAS Parallel Benchmarks (cont.)

EP – Embarrasingly Parallel
MG – MultiGrid Kernel
CG – Conjugate Gradient (CFD applications)
FT – PDE solver, many FFT’s
IS – Integer Sort
LU – Sparse Linear Solver
SP – Scalar Pentadiagonal Linear Systems
BT – Block Tridiagonal Linear Systems

CSE 160 Chien, Spring 2005 Lecture #11, Slide 15

NAS Parallel Benchmarks (cont.)

The Perfect Club

Developed at University of Illinois around 1987 » Center for Supercomputing Research and Development (CSRD) » Founded by David Kuck, a pioneer and leading researcher in the area of optimizing compilers » See Structure of Computers and Computations, David Kuck, John Wiley and Sons, 1978. » See also, Kuck and Associates (KAI)
Goal: standard set of benchmarks to measure the progress of parallel computer advances » Applications characterized by their algorithmic behavior » Allow users to get meaningful predictions for their own applications

CSE 160 Chien, Spring 2005 Lecture #11, Slide 19

Large Sorting Benchmarks

Look again, the 2005 Results are out! http://research.microsoft.com/barc/SortBenchmark/
Minute Sort (all the records you can sort in a Minute!) » 125M, 116GB, 2005 » 340Million, 32GB, 2004 » ~120M, 12GB, 2000
Terabyte Sort » 7.25 minutes, 2005 » ~30 Minutes, 2000-2003…
Penny Sort: As much as you can for a penny! » 163M, 15GB, 2005 » 105M, 10GB, 200-

Limitations and Pitfalls of

Benchmarks

Benchmarks do not address questions which you did not ask… only see what you measure
Specific application benchmarks will not tell you about the performance of other applications without proper analysis => extrapolation is not straightforward
General benchmarks will not tell you about the details of your specific application
You must understand the benchmark itself to understand what it tells you

CSE 160 Chien, Spring 2005 Lecture #11, Slide 21

Benefits of Benchmarks

Popular benchmarks keep vendors attuned to applications => vendors approximate applications
Benchmarks can give useful information about the performance of systems on particular kinds of programs => users compare systems
Benchmarks help in exposing performance bottlenecks of systems at the technical and applications level

Summary

Benchmark Types » Fixed Work » Scalable » Rate Based » Top500 List
Purposes and Uses of Benchmarks
Download and try some on FWGrid!

Measuring Machine Performance in Parallel Computing: Focus on CSE 160 Chien's Lecture, Assignments of Introduction to Sociology

Related documents

Partial preview of the text

Download Measuring Machine Performance in Parallel Computing: Focus on CSE 160 Chien's Lecture and more Assignments Introduction to Sociology in PDF only on Docsity!

Understanding and Measuring

Speedup

Using programs to Measure

Machine Performance

Benchmarks used for Parallel

Machines

The Livermore Loops

LinPack

Optimizing Linpack

Top 500 List

Want to see where your laptop

rates?

NAS Parallel Benchmarks

(NPB)

NAS Parallel Benchmarks (cont.)

NAS Parallel Benchmarks (cont.)

The Perfect Club

Large Sorting Benchmarks

Limitations and Pitfalls of

Benchmarks

Benefits of Benchmarks

Summary