Understanding Performance Evaluation in Computer Science - Prof. Jiang Li, Study notes of Computer Architecture and Organization

This document, presented by dr. Jiang li from howard university's department of systems & computer science, discusses the importance of evaluating performance in computer systems. The slides cover topics such as throughput, response time, performance calculation, effective cpi, and determinants of performance. The document also introduces concepts like amdahl's law and benchmarks.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-riq
koofers-user-riq 🇺🇸

2

(1)

9 documents

1 / 47

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Jiang Li
Dept. of Systems & Computer Science, Howard Univ. 1
Evaluating Performance
Dr. Jiang Li
Slides adapted from various sources (e.g. VT, RPI, UCSB etc)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f

Partial preview of the text

Download Understanding Performance Evaluation in Computer Science - Prof. Jiang Li and more Study notes Computer Architecture and Organization in PDF only on Docsity!

Dept. of Systems & Computer Science, Howard Univ. Jiang Li

Evaluating Performance

Dr. Jiang Li

Slides adapted from various sources (e.g. VT, RPI, UCSB etc)

Dept. of Systems & Computer Science, Howard Univ. Jiang Li

Evaluating Performance

 Why do we study performance?  Evaluate during design  Evaluate before purchasing  Key to understanding underlying organizational motivation  How can we (meaningfully) compare two machines?  Performance, Cost, Value, etc  Main issue:  Need to understand what factors in the architecture contribute to overall system performance and the relative importance of these factors  Effects of ISA on performance  How will hardware change affect performance

Dept. of Systems & Computer Science, Howard Univ. Jiang Li

Throughput vs. Response Time

 Response Time:  a.k.a. execution time (e.g. seconds or clock ticks)  How long does the program take to execute?  How long do I have to wait for a result?  Throughput:  Rate of completion (e.g. results per second/tick)  What is the average execution time of the program?  Measure of total work done

 Upgrading to a newer processor will improve: Response time, and therefore throughput.  Adding processors to the system will improve Throughput, and maybe response time.

Dept. of Systems & Computer Science, Howard Univ. Jiang Li

Throughput vs. Response Time

 Decreasing response time almost always improves

throughput.

 In reality, changing either affects the other.

 We will be primarily concerned with response time.

Dept. of Systems & Computer Science, Howard Univ. Jiang Li

Example: Performance Calculation

 A particular multiprocessor server machine’s performance is 4 times better than a given uniprocessor desktop system. If the desktop system runs an application in 28 seconds, how long will it take on the server?

time (^) server = 28 / 4 = 7 seconds

time

28 seconds

Performance

Performance

desktop server

server = =

Dept. of Systems & Computer Science, Howard Univ. Jiang Li

Example: Relative Performance

 If a particular desktop runs a program in 60

seconds and a laptop runs the same program in 75

seconds, how much faster is the desktop than the

laptop?

Performance (^) desktop = 1/execution_time (^) desktop = 1/ Performance (^) laptop = 1/execution_time (^) laptop = 1/ Performance (^) desktop / Performance (^) laptop = (1/60)/(1/75) = 1. Or simply: execution_time (^) laptop / execution_time (^) desktop = 1.

So, the desktop is 1.25 times faster than the laptop

Dept. of Systems & Computer Science, Howard Univ. Jiang Li

Clock Cycles

 Clock Cycles are a direct measure of time  Measures how fast the computer can perform basic functions  Discrete time interval in the CPU

 Clock period is the time for one clock cycle (seconds)  Clock rate is the inverse of clock period (cycles/second) 5 nsec clock cycle => 200 MHz clock rate 500 psec clock cycle => 2 GHz clock rate 200 psec clock cycle => 5 GHz clock rate

Dept. of Systems & Computer Science, Howard Univ. Jiang Li

Execution Time Formula

 Relating cycles to seconds:

CPU_time = CPU_cycles * cycle_time or CPU_time = CPU_cycles / clock_rate

 So to improve performance we have two options

 Decrease number of cycles to execute a program  Increase the clock rate (decrease cycle time)  However, these are often at odds with each other

Dept. of Systems & Computer Science, Howard Univ. Jiang Li

Determining Clock Cycles

 So what determines the number of cycles required to execute an application?  One possibility: #Cycles = #Instructions (i.e. one instruction is executed at each cycle)

 However, this is NOT true because different instructions take different amounts of time

Program Inst 1 Inst 2 Inst 3 Inst 4 … Time Cycle 1 Cycle 2 Cycle 3 Cycle 4 …

Dept. of Systems & Computer Science, Howard Univ. Jiang Li

Determining Clock Cycles (cont’d)

 A more realistic picture of what’s happening…

 Floating point operations can take longer than integer  Multiplication takes longer than addition  Memory accesses can take many cycles to complete

Clock cycles = Instructions × Avg Cycles Per Instr

Program Inst 1 Inst 2 Inst 3 … Time Cycle 1 Cycle 2 Cycle 3 Cycle 4 …

Dept. of Systems & Computer Science, Howard Univ. Jiang Li

Effective CPI

 Calculating clock cycles

 IC (^) i is the number of total instructions of class i  CPI (^) i is the average CPI for instruction class i  n is the number of instruction classes  Accounts for the weight and CPI of each instruction type

 Effective CPI

 CPI = Clock cycles / Number of instructions

 Minimum CPI

 CPI with instruction mix of exclusively the shortest instruction

=

= ×

n

i 1

CPU Clock Cycles (CPIi ICi)

Dept. of Systems & Computer Science, Howard Univ. Jiang Li

Example: Calculating CPI

 Given the following CPIs for each instruction class and instruction mixes, which code sequence executes fewer instructions? Which is faster? Which has the lower CPI?

Instructions: Seq 1: 2+1+2 = 5 ins Seq 2: 4+1+1 = 6 ins  Cycles: Seq 1: (21)+(12)+(23) = 10 cycles Seq 2: (41)+(12)+(13) = 9 cycles  CPI: Seq 1: 10/5 = 2. Seq 2: 9/6 = 1.

Class A B C CPI 1 2 3

Sequence A B C 1 2 1 2 2 4 1 1 Sequence 1 has fewer instructions

Sequence 2 is faster Sequence 2 has the lower CPI

Dept. of Systems & Computer Science, Howard Univ. Jiang Li

“THE” Performance Equation

 Separates the three key performance factors

 Instructions, CPI, Clock rate

 Can help evaluate design decisions

 Known effects on these terms can be translated into the overall effect on performance

 How can the values of these terms be found?

 Time: by running the program  Clock rate: published by computer manufacturer  Instructions and CPI:  Hardware performance counters – CPU logic to record events  Simulation of the system

Dept. of Systems & Computer Science, Howard Univ. Jiang Li

Performance Equation Example 1

 Suppose we have two implementations of the same ISA. Computer A has a cycle time of 250 ps and a CPI (cycles per instruction) of 2.0 for some program, and computer B has a cycle time of 500 ps and a CPI of 1.2 for the same program. Which computer is faster for this program?  Note: A constant number of instructions will be executed: I

clock_cycles (^) A = I × 2.0, timeA = I × 2.0 × 250 ps = 500 × I ps clock_cycles (^) B = I × 1.2, timeB = I × 1.2 × 500 ps = 600 × I ps

Computer A is 1.2 times faster

  1. 2 500 Ips

600 Ips Time

Time Performance

Performance A

B B

A (^) = ×

× = =