Performance - Computer Systems Architecture | CS 365, Study notes of Computer Architecture and Organization

Material Type: Notes; Class: Computer Systems Architecture; Subject: Computer Science; University: George Mason University; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/10/2009

koofers-user-lna
koofers-user-lna 🇺🇸

4

(2)

9 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Chapter 2
2
Measure, Report, and Summarize
Make intelligent choices
See through the marketing hype
Key to understanding underlying organizational motivation
Why is some hardware better than others for different programs?
What factors of system performance are hardware related?
(e.g., Do we need a new machine, or a new operating system?)
How does the machine's instruction set affect performance?
Performance
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Performance - Computer Systems Architecture | CS 365 and more Study notes Computer Architecture and Organization in PDF only on Docsity!

1

Chapter 2

  • Measure, Report, and Summarize
  • Make intelligent choices
  • See through the marketing hype
  • Key to understanding underlying organizational motivation

Why is some hardware better than others for different programs?

What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?)

How does the machine's instruction set affect performance?

Performance

3

Which of these airplanes has the best performance?

Airplane Passengers Range (mi) Speed (mph)

Boeing 737-100 101 630 598 Boeing 747 470 4150 610 BAC/Sud Concorde 132 4000 1350 Douglas DC-8-50 146 8720 544

  • How much faster is the Concorde compared to the 747?
  • How much bigger is the 747 than the Douglas DC-8?

Two notions of “performance”

° Time to do the task (Execution Time)

  • execution time, response time, latency

° Tasks per day, hour, week, sec, ns. .. (Performance)

  • throughput, bandwidth

Response time and throughput often are in opposition

Plane

Boeing 747

BAD/Sud Concodre

Speed

610 mph

1350 mph

DC to Paris

6.5 hours

3 hours

Passengers

Throughput (pmph)

Which has higher performance?

7

  • For some program running on machine X,

PerformanceX = 1 / Execution timeX

  • "X is n times faster than Y"

PerformanceX / PerformanceY = n

  • Problem:
    • machine A runs a program in 20 seconds
    • machine B runs the same program in 25 seconds

Book's Definition of Performance

Clock Cycles

  • Instead of reporting execution time in seconds, we often use cycles
  • Clock “ticks” indicate when to start activities (one abstraction):
  • cycle time = time between ticks = seconds per cycle
  • clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec)

A 200 Mhz. clock has a cycle time

time

seconds program

= cycles program

× seconds cycle

200 × 106

× 109 = 5 nanoseconds

9

So, to improve performance (everything else being equal) you can either

________ the # of required cycles for a program, or

________ the clock cycle time or, said another way,

________ the clock rate.

How to Improve Performance

seconds program

= cycles program

× seconds cycle

reduce

decrease

increase

  • Could assume that # of cycles = # of instructions

This assumption is incorrect,

different instructions take different amounts of time on different machines.

Why? hint: remember that these are machine instructions, not lines of C code

time

1st instruction2nd instruction3rd instruction4th 5th6th ...

How many cycles are required for a program?

13

Example

Let C = number of cycles Execution time = C X clock cycle time = C/ clock rate

On computer A, C/ 400 MHz = C/ 400 X 10^6 = 10 seconds => C = 400 X 10^7

On computer B, number of cycles = 1.2 X C What should be B’s clock rate so that our favorite program has smaller execution time? 1.2 X C/ clock rate < 10 => 1.2 X 400 X 10^7 / 10 < clock rate I.e. clock rate > 480 MHz

  • A given program will require
    • some number of instructions (machine instructions)
    • some number of cycles
    • some number of seconds
  • We have a vocabulary that relates these quantities:
    • cycle time (seconds per cycle)
    • clock rate (cycles per second)
    • CPI (cycles per instruction) a floating point intensive application might have a higher CPI

Now that we understand cycles

15

CPI = Average cycles per instruction for the

program

Consider a program with 5 instructions

CPI 11/5 = 2.

Total 11

5 1

4 2

3 4

2 2

1 2

Instruction #cycles

Another way of saying it is 11 = 5 × 2.

OR CPU cycles = #instructions × CPI

Aspects of CPU Performance

cycle

seconds

instruction

cycles

program

instructions

program

seconds cpu time = = × ×

Clock cycle time

Instruction CPI Count Program X X

Technology X

Organization X X

Instruction X X Set

Compiler X X

19

Performance

  • Performance is determined by execution time
  • Do any of the other variables equal performance?
    • # of cycles to execute program?
    • # of instructions in program?
    • # of cycles per second?
    • average # of cycles per instruction?
    • average # of instructions per second?
  • Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.

CPI

"instruction frequency"

CPI = (CPU Time × Clock Rate) / Instruction Count = Clock Cycles / Instruction Count

“Average cycles per instruction”

CPU time Clockcycletime ( ) j 1

j

n

= ×∑ CPI j × I

=

Instruction Count

where

1

j j j

n

j

j

I

CPI = ∑ CPI × F F =

=

21

Suppose we have two implementations of the same instruction set architecture (ISA).

For some program,

Machine A has a clock cycle time of 10 ns. and a CPI of 2. Machine B has a clock cycle time of 20 ns. and a CPI of 1.

What machine is faster for this program, and by how much?

If two machines have the same ISA which of our quantities (e.g., clock rate, CPI, execution time, # of instructions) will always be identical?

CPI Example

CPI Example

For machine A

CPU time = IC × CPI × Clock cycle time

CPU time = IC × 2.0 × 10 ns = 20 IC ns

For machine B

CPU time = IC × 1.2 × 20 ns = 24 IC ns

25

  • Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software.

The first compiler's code uses 5 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions.

The second compiler's code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions.

  • Which sequence will be faster according to MIPS?
  • Which sequence will be faster according to execution time?

MIPS example

  • Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software. The first compiler's code uses 5 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. The second compiler's code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions.
  • Which sequence will be faster according to MIPS?
  • Which sequence will be faster according to execution time?

MIPS example

ForsequenceB,executiontime 0.15secondsbutMIPS 80!!!

70 107 10

100 10 MIPS

  1. 1 seconds 100 10

1 7

Executiontime ( 5 1 1 ) 10 10

7

10 ( 5 1 1 ) 10

CPI (5^1121 3)^10

ForsequenceA,

CPI 10

clockrate IC CPI 10

IC clockrate IC CPIclockcycletime 10

IC

Executiontime 10

MIPSMillionsofinstructionspersecond NumberofInstructions

6

6

6

6

6

6

6 6 6

6

= =

= ×

×

= ×

= ++ × × ×

= ++ ×

= ×+× +× ×

×

= × ×

= × × × ×

=

×

= =

27

program while P do if Q then A else B fi if R then break fi C od end

P

Q

A B

R

Exit

C

Profiling a program: Identifying the basic blocks

Profiling a program

**1. Find “basic blocks” of program

  1. Create a counter variable for each basic block
  2. Insert code that increments the counter for a basic block at the** **beginning of that block
  3. Print out counters at the end of the program
  4. Count instructions in each basic block
  5. From steps 5 and 6, you have info about the instructions executed** by the program

31

  • Performance best determined by running a real application
    • Use programs typical of expected workload
    • Or, typical of expected class of applications e.g., compilers/editors, scientific applications, graphics, etc.
  • Small benchmarks
    • nice for architects and designers
    • easy to standardize
    • can be abused
  • SPEC (System Performance Evaluation Cooperative)
    • companies have agreed on a set of real program and inputs
    • can still be abused (Intel’s “other” bug)
    • valuable indicator of performance (and compiler technology)

Benchmarks

SPEC ‘

  • Compiler “enhancements” and performance

0

1 00

2 00

3 00

4 00

5 00

6 00

7 00

8 00

g cc e spr ess o sp ice do du c na sa7 li e q ntottma trix 30 0 fp p pp to mca tv Be n chm ark C om pi le r En h an ced co mpi le r

SPEC performance ratio

33

SPEC ‘

Benchmark Description go Artificial intelligence; plays the game of Go m88ksim Motorola 88k chip simulator; runs test program gcc The Gnu C compiler generating SPARC code compress Compresses and decompresses file in memory li Lisp interpreter ijpeg Graphic compression and decompression perl Manipulates strings and prime numbers in the special-purpose programming language Perl vortex A database program tomcatv A mesh generation program swim Shallow water model with 513 x 513 grid su2cor quantum physics; Monte Carlo simulation hydro2d Astrophysics; Hydrodynamic Naiver Stokes equations mgrid Multigrid solver in 3-D potential field applu Parabolic/elliptic partial differential equations trub3d Simulates isotropic, homogeneous turbulence in a cube apsi Solves problems regarding temperature, wind velocity, and distribution of pollutant fpppp Quantum chemistry wave5 Plasma physics; electromagnetic particle simulation

SPEC ‘

Does doubling the clock rate double the performance? Can a machine with a slower clock rate have better performance?

Clock rat e (MHz)

SPECint

2

0

4

6

8

3

1

5

7

9

10

50 100 150 200 250 Pentium Pentium Pro

Clock rate (M Hz) Pentium

SPECfp

Pentium Pro

2

0

4

6

8

3

1

5

7

9

10

50 100 150 200 250

  • Performance is specific to a particular program/s
    • Total execution time is a consistent summary of performance
  • For a given architecture performance increases come from:
    • increases in clock rate (without adverse CPI affects)
    • improvements in processor organization that lower CPI
    • compiler enhancements that lower CPI and/or instruction count
  • Pitfall: expecting improvement in one aspect of a machine’s performance to affect the total performance
  • You should not always believe everything you read! Read carefully! (see newspaper articles, e.g., Exercise 2.37)

Remember