



















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Performance Measures, Timing, Timing Mechanisms, Measurement Pitfalls, Modeling Discretization Error, Relative Error, Profiling, Profiling Errors, DCPI Architecture
Typology: Slides
1 / 27
This page cannot be seen from the preview
Don't miss anything!




















Performance measures (metrics)
Timing
Profiling
class22.ppt
difference between start and finish of an operation
completion time, execution timesynonyms: running time, elapsed time, response time, latency,
most straightforward performance measure
running time normalized to some reference time
(e.g. time/reference time)
class22.ppt
than a matrix-vector product with a good MFLOPS rate.Fourier transform. An FFT with a bad MFLOPS rate may run fasterconvolution algorithm: n^2 matix-vector product vs nlogn fastnumber of floating point operations depends on the particular
a program that runs faster will convolve more images per second.
class22.ppt
returns elapsed time since epoch (e.g., Jan 1, 1970)
Unix getclock() command
coarse grained (e.g.,
us resolution on Alpha)
long int secs,
ns;
struct timespec
*start, *stop;
printf(“%ld ns\n”,ns = (stop->tv_nsec - start->tv_nsec);secs = (stop->tv_sec - start->tv_sec);getclock(TIMEOFDAY, stop);P();getclock(TIMEOFDAY, start);
secs*1e9 +
ns);
class22.ppt
counts system events (CYCLES, IMISS, DMISS, BRANCHMP)
very fine grained
short time span (e.g., 9 seconds on 450 MHz Alpha)
unsigned
int counterRoutine[] = { /* Alpha cycle counter */
0x6bfa8001u0x401f0000u,0x601fc000u,
unsigned};
int (*counter)(void) = (void *)counterRoutine;
printf(“%d cycles\n”, cycles);cycles = counter() - cycles;P();cycles = counter();
cycle counter Using the Alpha
class22.ppt
need to measure large enough chunks of work
but how large is large enough?
artificial hits or misses
cold start misses due to context swapping
CS 213 F’
class22.ppt
timer period:
dt secs/tick
timer resolution:
1/dt ticks/sec
time
dt
clock interrupt (tick)
T 1 T 2 T n
start
finish
program execution time
interval 2
Assume here that
(^) T k-
CS 213 F’
class22.ppt
time
dt
T 1 T 2 T n T start
finish
actual program execution time
measured time:
n
1 )
actual time:
n
1 ) + (T
finish
n ) - (T
start
1 )
f absolute error = measured time - actual time start
start
1 )/dt
fraction of interval overreported
f finish
finish
n )/dt
fraction of interval underreported
absolute error =
dt f
start
finish
= dt (f
start
finish
max absolute error =
+/- dt
class22.ppt
time
actual running time
Actual time = near
2dt
measured time =
dt
Absolute measurement error =
-dt
CS 213 F’
class22.ppt
while (start start = 0;
(end
get_etime())))
dt = end
start;
printf(“dt
%lf\n”, dt);
Digital Unix Alpha systems: dt = 1ms
class22.ppt
Let
t and
t’ be the actual and measured running times of the loop,
respectively, and let
dt be the timer period.
Also, let
t’-t be the absolute error and let
|t’-t|/t be the relative error.
or equal to EProblem: What value of t’ will result in a relative error less than
max
Fact (1):
|t’-t| <= dt
Fact (2):
t’ - dt <= t
We want
|t’-t|/t <= E
max
dt/t <= E
max
dt/ E
max
(^) <= t
(algebra)
dt/ E
max
(^) <= t’ - dt
dt/ E
max
class22.ppt
for (i=0; i= 0.070 seconds (70 ms).001/.05 + .05 <= t’
class22.ppt
discretization error
but can’t always measure short procedures in loops
cache effects due to ordering and context switches
class22.ppt
src translation
binary translation
direct simulation
statistical sampling