CPU Performance - Computer Organization and Design - Lecture Slides, Slides of Computer Aided Design (CAD)

The digital system design, is very helpful series of lecture slides, which made programming an easy task. The major points in these laboratory assignment are:Cpu Performance, Best Measure Performance, Focus on Response Time, Performance Equation, Clock Cycle Time, Total Clock Cycles for Program, Cycles Per Instruction, Profiling Code, Types of Instructions

Typology: Slides

2012/2013

Uploaded on 04/24/2013

baijayanthi
baijayanthi 🇮🇳

4.5

(13)

166 documents

1 / 51

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CPU Performance
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33

Partial preview of the text

Download CPU Performance - Computer Organization and Design - Lecture Slides and more Slides Computer Aided Design (CAD) in PDF only on Docsity!

CPU Performance

How to Best Measure Performance?

  • Time!
  • What kind of “time”?
    • Response time: time between start and finish of a program, aka wall clock time. Good measure from the “user” perspective
    • Throughput: total amount of work done in a fixed amount of time. Good from the “computer center” perspective
  • Note: a improvement in one measure usually results in an improvement in the other

Example

  • Machine A: program takes 10 sec
  • Machine B: same program takes 15 sec
  • Hence: perf(A) / perf(B) = exec(B) / exec(A) = 15 / 10 = 1.
  • Machine A is 1.5 times faster than machine B

How to Measure Time

  • CPU time via the “CPU performance equation”
  • CPU time = CPU clock cycles * clock cycle time
  • Abbrev: exec = cycles * tclk
  • Equivalently: exec = cycles / frequency

Example (2)

  • cycles(B) = 1.2 * cycles(A) = 1.2 * 2e10 = 2.4e
  • Also: exec(B) = cycles(B) / freq(B)
  • So: freq(B) = cycles(B) / exec(B) = 2.4e10 / 6 sec = 4 GHz
  • Need to double clock rate to meet the 6 sec requirement!

CPU Performance Equation (1)

  • We rarely have access to the total clock cycles for a program P.
  • Usually can get total number of instructions executed (by profiling the code)
  • Usually can get the number of cycles per instruction (look it up in a datasheet for that processor)
  • Hence, a better CPU performance equation is:

exec = IC * CPI * tclk

Ex: CPU Performance Equation (1)

  • Suppose machine A has a tclk of 1 ns (1e-9 sec) and a CPI of 2.0 for a program P
  • Suppose machine B has the same ISA as machine A, but has a tclk of 2 ns and a CPI of 1.2 for program P
  • Which machine is “faster”?
  • exec = IC * CPI * tclk
    • exec(A) = IC(A) * 2.0 * 1 ns = 2.0 * IC(A)
    • exec(B) = IC(B) * 1.2 * 2 ns = 2.4 * IC(B)
  • Note that P is fixed, and the ISA is the same on both machines

Ex: CPU Performance Equation (2)

  • Perf(A) / Perf(B) = exec(B) / exec(A)

= (2.4 * IC(B)) / (2.0 * IC(A)) = 1.2 IC(B) / IC(A)

  • How to get IC(B) / IC(A)?
  • IC(B) = IC(A) since P is fixed and the ISA is the same
  • Hence, machine A is 1.2 times faster than machine B for P

Measuring Performance

Reconsidering the CPI

  • Computing the CPI requires careful analysis of how each instruction behaves on the hardware
  • Must understand how each assembly language instruction is implemented to know how many cycles it takes
  • Note: many times you don’t have total IC and average CPI
  • Instead you might have the IC for each separate instruction, or for each of a family of instructions
  • And you might have the CPI for each of those separate types of instructions

Example (1)

  • Suppose instruction A has a CPI = 1 (e.g. jmp, bne)
  • Suppose instruction B has a CPI = 2 (e.g. add/sub)
  • Suppose instruction C has a CPI = 3 (e.g. mult)
  • Suppose a compiler writer has a choice between 2 approaches to implementing a single HLL instruction:
  • Sequence 1 requires 2 A, 1 B, 2 C
  • Sequence 2 requires 4 A, 1 B, 1 C

Example (2)

  • Question1: which code sequence is longer?
  • Question2: which code sequence is faster?
  • Question3: what is the CPI of each sequence?
  • Sequence 1 executes 2 + 1 + 2 instructions = 5
  • Sequence 2 executes 4 + 1 + 1 instructions = 6
  • Question1 answer: Sequence 2 is longer. If code size is our performance measure, then Sequence 1 is best.

Example (4)

  • Average CPI(seq1) = (1 * 2 + 2 * 1 + 3 * 2) / 5 = 10/5 = 2
  • Average CPI(seq2) = (1 * 4 + 2 * 1 + 3 * 1) / 6 = 9/6 = 1.
  • Question3 answer: The average CPI of sequence 2 is lower than the average CPI of sequence 1

Another Example (1)

  • Suppose an computer designer has two alternatives to implementing branches:
  • CPU A: set a condition code based on the result of a “compare” instruction and then follow that compare immediately by a “branch” instruction that uses that code
  • CPU B: the compare is “folded” into the branch in a single all-purpose branch instruction that does both in one step