







































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This document, presented by dr. Jiang li from howard university's department of systems & computer science, discusses the importance of evaluating performance in computer systems. The slides cover topics such as throughput, response time, performance calculation, effective cpi, and determinants of performance. The document also introduces concepts like amdahl's law and benchmarks.
Typology: Study notes
1 / 47
This page cannot be seen from the preview
Don't miss anything!








































Dept. of Systems & Computer Science, Howard Univ. Jiang Li
Slides adapted from various sources (e.g. VT, RPI, UCSB etc)
Dept. of Systems & Computer Science, Howard Univ. Jiang Li
Why do we study performance? Evaluate during design Evaluate before purchasing Key to understanding underlying organizational motivation How can we (meaningfully) compare two machines? Performance, Cost, Value, etc Main issue: Need to understand what factors in the architecture contribute to overall system performance and the relative importance of these factors Effects of ISA on performance How will hardware change affect performance
Dept. of Systems & Computer Science, Howard Univ. Jiang Li
Response Time: a.k.a. execution time (e.g. seconds or clock ticks) How long does the program take to execute? How long do I have to wait for a result? Throughput: Rate of completion (e.g. results per second/tick) What is the average execution time of the program? Measure of total work done
Upgrading to a newer processor will improve: Response time, and therefore throughput. Adding processors to the system will improve Throughput, and maybe response time.
Dept. of Systems & Computer Science, Howard Univ. Jiang Li
Dept. of Systems & Computer Science, Howard Univ. Jiang Li
A particular multiprocessor server machine’s performance is 4 times better than a given uniprocessor desktop system. If the desktop system runs an application in 28 seconds, how long will it take on the server?
time (^) server = 28 / 4 = 7 seconds
desktop server
Dept. of Systems & Computer Science, Howard Univ. Jiang Li
Performance (^) desktop = 1/execution_time (^) desktop = 1/ Performance (^) laptop = 1/execution_time (^) laptop = 1/ Performance (^) desktop / Performance (^) laptop = (1/60)/(1/75) = 1. Or simply: execution_time (^) laptop / execution_time (^) desktop = 1.
So, the desktop is 1.25 times faster than the laptop
Dept. of Systems & Computer Science, Howard Univ. Jiang Li
Clock Cycles are a direct measure of time Measures how fast the computer can perform basic functions Discrete time interval in the CPU
Clock period is the time for one clock cycle (seconds) Clock rate is the inverse of clock period (cycles/second) 5 nsec clock cycle => 200 MHz clock rate 500 psec clock cycle => 2 GHz clock rate 200 psec clock cycle => 5 GHz clock rate
Dept. of Systems & Computer Science, Howard Univ. Jiang Li
CPU_time = CPU_cycles * cycle_time or CPU_time = CPU_cycles / clock_rate
Decrease number of cycles to execute a program Increase the clock rate (decrease cycle time) However, these are often at odds with each other
Dept. of Systems & Computer Science, Howard Univ. Jiang Li
So what determines the number of cycles required to execute an application? One possibility: #Cycles = #Instructions (i.e. one instruction is executed at each cycle)
However, this is NOT true because different instructions take different amounts of time
Program Inst 1 Inst 2 Inst 3 Inst 4 … Time Cycle 1 Cycle 2 Cycle 3 Cycle 4 …
Dept. of Systems & Computer Science, Howard Univ. Jiang Li
Floating point operations can take longer than integer Multiplication takes longer than addition Memory accesses can take many cycles to complete
Program Inst 1 Inst 2 Inst 3 … Time Cycle 1 Cycle 2 Cycle 3 Cycle 4 …
Dept. of Systems & Computer Science, Howard Univ. Jiang Li
IC (^) i is the number of total instructions of class i CPI (^) i is the average CPI for instruction class i n is the number of instruction classes Accounts for the weight and CPI of each instruction type
CPI = Clock cycles / Number of instructions
CPI with instruction mix of exclusively the shortest instruction
=
n
i 1
Dept. of Systems & Computer Science, Howard Univ. Jiang Li
Given the following CPIs for each instruction class and instruction mixes, which code sequence executes fewer instructions? Which is faster? Which has the lower CPI?
Instructions: Seq 1: 2+1+2 = 5 ins Seq 2: 4+1+1 = 6 ins Cycles: Seq 1: (21)+(12)+(23) = 10 cycles Seq 2: (41)+(12)+(13) = 9 cycles CPI: Seq 1: 10/5 = 2. Seq 2: 9/6 = 1.
Class A B C CPI 1 2 3
Sequence A B C 1 2 1 2 2 4 1 1 Sequence 1 has fewer instructions
Sequence 2 is faster Sequence 2 has the lower CPI
Dept. of Systems & Computer Science, Howard Univ. Jiang Li
Instructions, CPI, Clock rate
Known effects on these terms can be translated into the overall effect on performance
Time: by running the program Clock rate: published by computer manufacturer Instructions and CPI: Hardware performance counters – CPU logic to record events Simulation of the system
Dept. of Systems & Computer Science, Howard Univ. Jiang Li
Suppose we have two implementations of the same ISA. Computer A has a cycle time of 250 ps and a CPI (cycles per instruction) of 2.0 for some program, and computer B has a cycle time of 500 ps and a CPI of 1.2 for the same program. Which computer is faster for this program? Note: A constant number of instructions will be executed: I
clock_cycles (^) A = I × 2.0, timeA = I × 2.0 × 250 ps = 500 × I ps clock_cycles (^) B = I × 1.2, timeB = I × 1.2 × 500 ps = 600 × I ps
Computer A is 1.2 times faster
600 Ips Time
Time Performance
Performance A
B B
A (^) = ×
× = =