Advanced Computer Architecture Performance Analysis - Prof. Hsien-Hsin Lee, Assignments of Computer Architecture and Organization

Lecture notes from georgia institute of technology's ece 4100/6100 course on advanced computer architecture, focusing on performance analysis, including concepts such as execution time, throughput, cpu performance, architecture comparison, and performance evaluation using benchmarks. It also covers amdahl's law and the principle of locality.

Typology: Assignments

Pre 2010

Uploaded on 08/05/2009

koofers-user-z60-1
koofers-user-z60-1 🇺🇸

10 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ECE 4100/6100
Advanced Computer Architecture
Lecture 1 Performance
Prof. Hsien-Hsin Sean Lee
School of Electrical and Computer Engineering
Georgia Institute of Technology
2
Reading Assignment
Chapter 1
Sections 1.8, 1.9, 1.1o
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Advanced Computer Architecture Performance Analysis - Prof. Hsien-Hsin Lee and more Assignments Computer Architecture and Organization in PDF only on Docsity!

ECE 4100/

Advanced Computer Architecture

Lecture 1 Performance

Prof. Hsien-Hsin Sean Lee

School of Electrical and Computer Engineering

Georgia Institute of Technology

Reading Assignment

  • Chapter 1
    • Sections 1.8, 1.9, 1.1o

3

Performance

  • Execution/Response time (Latency)
    • Elapsed time between start and completion of an

event

  • How long my job takes?
  • Throughput (Bandwidth)
  • Total amount of work done within a given period

of time

  • How many jobs done per unit time on a system?

CPU Performance

  • Execution Time = Seconds / Program

Instruction cycle

Instructions cycles seconds

program

× ×

  • Programmer
  • Algorithms
  • ISA
  • Compilers
    • Microarchitecture
    • System architecture
      • Microarchitecture, pipeline depth
      • Circuit design
      • Technology

7

Performance Comparison

  • For some program running on machine X,

PerformanceX = 1 / Execution timeX

  • "X is nn times faster than Y"

PerformanceX / PerformanceY = nn = speedup of X over Y

  • Problem:
    • machine A runs a program in 20 seconds
    • machine B runs the same program in 25 seconds

Performance Evaluation: Benchmark

  • (Real) Programs
    • In the form of collection of programs
    • E.g. SPEC, Winstone, SYSMARK, 3D Winbench, EEMBC
  • Kernels:
    • Small key pieces of real programs
    • E.g. Livermore Loops Kernels (LLK), Linpack
  • Modified (or scripted)
    • To focus on some particular aspects (e.g. remove I/O, focus on CPU)
  • (Toy) Benchmarks
    • Produce expected results
  • Synthetic Benchmarks:
    • Representative instruction mix
    • E.g. Dhrystone, Whetstone
  • Important for
    • Architectural and microarchitectural design trade-off
    • Competitive analysis of real products

9

Performance Summary Measurement

• Average of total execution time

• This is Arithmetic Mean (Weighted ArithmeticArithmetic Mean (Weighted Arithmetic

Mean)Mean)

= =

n

i

i i

n

i

i

Weight Time

n

Time

n 1 1

or

Performance Summary Measurement

• Ratei is a function of 1/Time i

• Used to represent the average “rate” such as

instruction per cycle (IPC)

n

i i

i

n

i i Rate

Weight

n

Rate

n

or

13

Amdahl’s Law Analogy

  • Driving from Orlando to Atlanta
    • 60 miles/hr from Orlando to Macon
    • 120 miles/hr from Macon to Atlanta
    • How much time you can save

compared against driving all the way

at 60 miles/hr from Orlando to

Atlanta?

  • 6hr 45min vs. 7hr 30min = ~11%

speedup

  • Key is to speed up the biggie portion, i.e.

speed up frequently executed blocks

Parallelism vs. Speedup

1.11x

1.97x

1.33x 1

10

100

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Speed-up

Code portion in Faster mode (f)

Amdahl's Law speed-up as a function of parallelism P=1P= P=4P= P=16P= P=

15

The Principle of Locality

  • Knuth made the original observation about program locality in 1971. - … less than 4 percent of a program generally accounts for more than half of its running time.
  • 90/10 rule: a program spends 90% of its execution time in only 10% of the code
  • Two types of locality
    • Temporal locality (locality in time)
    • Spatial locality (locality in space)
  • Memory subsystem design heavily leverages the locality concept for better performance

Example of Performance Evaluation (I)

Branches 24% 2

Stores 12% 2

Loads 21% 2

ALU Ops (reg-reg) 43% 1

Clock cycle count

Operation Frequency

Assume 25% of the ALU ops directly use a loaded operand that is not used again. We propose adding ALU instructions that have one src operand in memory. These new reg-mem instructions spend 2 clock cycles. Also assume that the extended instruction set increase branch’s clock by 1, but no impact to cycle time. Would this change improve performance?

19

Example of Performance Evaluation (II)

FP instructions = 25%

Average CPI of FP instructions = 4.

Average CPI of other instructions = 1.

FPSQRT = 2%, CPI of FPSQRT = 20

  • Design Option 1: decrease the CPI of FPSQRT to 2
  • Design Option 2: decease the average CPI of all FP instructions to 2.

Original CPI = 0.254 + 1.33(1-0.25) = 2.

Option 1 CPI = 2.0 – 2%*(20-2) = 1.

Option 2 CPI = 0.252.5 + 1.33(1-0.25) = 1.

Speedup of Option 1 = 2/1.64 = 1. Speedup of Option 2 = 2/1.625 = 1.

Example of Performance Evaluation (III)

Clock freq = 1.4 GHz

FP instructions = 25%

Average CPI of FP instructions = 4.

Average CPI of other instructions = 1.

FPSQRT = 2%, CPI of FPSQRT = 20

  • Design Option 1: decrease the CPI of FPSQRT to 2, clock freq = 1.2GHz
  • Design Option 2: decease the average CPI of all FP instructions to 2.5,

clock freq = 1.1 GHz

21

Example of Performance Evaluation (III)

Clock freq = 1.4 GHz

FP instructions = 25%

Average CPI of FP instructions = 4.

Average CPI of other instructions = 1.

FPSQRT = 2%, CPI of FPSQRT = 20

  • Design Option 1: decrease the CPI of FQSQRT to 2, clock freq = 1.2GHz
  • Design Option 2: decease the average CPI of all FP instructions to 2.5,

clock freq = 1.1 GHz

Original CPI = 2.0, IPC = 1/2, Inst/Sec = ½*1.4G = 0.7G inst/s

Option 1 CPI = 1.64, IPC = 1/1.64, Inst/Sec = 1/1.64*1.2G = 0.73G inst/s

Option 2 CPI = 1.625, IPC = 1/1.625, Inst/Sec = 1/1.625*1.1G = 0.68G inst/s

Study Guide: Glossary

  • Amdahl’s Law
  • Benchmark
    • Toy, kernel, synthetic, application
  • CPI
  • Harmonic Mean
  • Locality
  • Speedup