











Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of a university course, cs/ee 6810, focusing on measuring performance, cost, and power in computer architecture. Topics include instruction level parallelism, memory hierarchy, multiprocessors, storage systems, networks, and processor technology trends. Students will learn about performance metrics, technology trends, and power consumption trends. The course includes lectures, homework assignments, and midterms.
Typology: Assignments
1 / 19
This page cannot be seen from the preview
Don't miss anything!












Background: CS 3810 or equivalent, based on Hennessy and Patterson’s Computer Organization and Design
Text for CS/EE 6810: Hennessy and Patterson’s Computer Architecture, A Quantitative Approach, 4 th Edition - Topics
Measuring performance/cost/power ¾ Instruction level parallelism, dynamic and static ¾ Memory hierarchy ¾ Multiprocessors ¾ Storage systems and networks
Office hours, MEB 3414, by appointment
TA: Kenneth Williams; TA office hrs: TBA - Special accommodations, add/drop policies (see class webpage) - Class web-page and class mailing list at^ http://www.eng.utah.edu/~cs - Grades:
Two midterms, 25% each ¾ Homework assignments, 50%, you may skip one ¾ No tolerance for cheating
15x performance growth can be attributed to architectural innovations
Modern trends:
Clock speed improvements are slowing
power constraints already doing less work per stage ¾ Difficult to further optimize a single core for performance ¾ Multi-cores: each new processor generation will^ accommodate more cores
DRAM density increases by 40-60% per year, latency has reduced by 33% in 10 years (the memory wall!), bandwidth improves twice as fast as latency decreases
Disk density improves by 100% every year, latency improvement similar to DRAM - Networks: primary focus on bandwidth; 10Mb
100Mb in 10 years; 100Mb
1Gb in 5 years
Dyn power α activity x capacitance x voltage 2 x frequency
Capacitance per transistor and voltage are decreasing, but number of transistors and frequency are increasing at a faster rate - Leakage power is also rising and will soon match dynamic power - Power consumption is already between 100-150W in high-performance processors today
Consider 25 programs from a benchmark set – how do^ we capture the behavior of all 25 programs with a^ single number?
Sys-A
Sys-B
Sys-C
Total (average) execution time ¾ Total (average) weighted execution time ¾ Average of normalized execution times ¾ Geometric mean of normalized execution times
We fixed a reference machine X and ran 4 programs A, B, C, D on it such that each program ran for 1 second
The exact same workload (the four programs execute^ the same number of instructions that they did on^ machine X) is run on a new machine Y and the^ execution times for each program are 0.8, 1.1, 0.5, 2 - With AM of normalized execution times, we can conclude^ that Y is 1.1 times slower than X – perhaps, not for all^ workloads, but definitely for one specific workload (where^ all programs run on the ref-machine for an equal #cycles) - With GM, you may find inconsistencies
GM: does not require a reference machine, but does not predict performance very well
So we multiplied execution times and determined that sys-A is 1.2x faster…but on what workload?
AM: does predict performance for a specific workload, but that workload was determined by executing programs on a reference machine
Every year or so, the reference machine will have to be updated
Advantage of GM: no reference machine required
Disadvantage of GM: does not represent any “real entity”^ and may not accurately predict performance - Disadvantage of AM of normalized: need weights (which^ may change over time) - Advantage: can represent a real workload
Assume that an architectural innovation only affects CPI
For 3 programs, base CPIs: 1.2, 1.8, 2.5 CPIs for proposed model: 1.4, 1.9, 2. - What is the best way to summarize performance with a single number? AM, HM, or GM of CPIs?
AM of CPI for base case = 1.2 cyc
HM of CPI = 1 / AM of IPC ; defines a workload where^ every program runs for an equal number of cycles - GM of CPI: warm fuzzy number, not necessarily representing any workload
Bullet