


























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The optimization of vector operations, focusing on the limitations of compilers and techniques to improve performance. Topics include code motion, vector abstract data types (adt), optimization examples, time scales, cycles per element, and procedure calls. The document also discusses the importance of understanding compiler capabilities and limitations.
Typology: Lab Reports
1 / 34
This page cannot be seen from the preview
Don't miss anything!



























Machine-Independent Optimizations z Code motion z Reduction in strength z Common subexpression sharing
Tuning z Identifying performance bottlenecks class10.ppt
Easily see 10:1 performance range depending on how codeis written
Must optimize at multiple levels: z algorithm, data representations, procedures, and loops
How programs are compiled and executed
How to measure program performance and identifybottlenecks
How to improve performance without destroying codemodularity and generality
Must not cause any change in program behavior under anypossible condition Often prevents it from making optimizations when would only affectbehavior under pathological conditions. Behavior that may be obvious to the programmer can be^ Behavior that may be obvious to the programmer can be obfuscated b y languages and coding styles obfuscated b y languages and coding styles e.g., data ranges may be more limited than variable types suggest Most analysis is performed only within procedures^ Most analysis is performed only within procedures
whole-program analysis is too expensive in most cases Most analysis is based only on^ Most analysis is based only on staticstatic informationinformation compiler has difficulty anticipating run-time inputs When in doubt, the compiler must be conservative^ When in doubt, the compiler must be conservative
Optimizations you should do regardless of processor /compiler
Reduce frequency with which computation performed z If it will always produce same result z Especially moving code out of loop for (i = 0; i < n; i++) { _int ni = ni;_* for (j = 0; j < n; j++) a[ni
Replace costly operation with simpler one
Shift, add instead of multiply or divide 16*x --> x << 4 z Utility machine dependent z Depends on cost of multiply or divide instruction z On Pentium II or III, integer multiply only requires 4 CPU cycles
Recognize sequence of products for (i = 0; i < n; i++) for (j = 0; j < n; j++) a[n*i + j] = b[j]; int ni = 0; for (i = 0; i < n; i++) { for (j = 0; j < n; j++) a[ni
Reading and writing registers much faster thanreading/writing memory
Compiler not always able to determine whether variable canbe held in register
Possibility of Aliasing
See example later
length^ data
vec_ptr new_vec(int len) z Create vector of specified length int get_vec_element(vec_ptr v, int index, int *dest) z *Retrieve vector element, store at dest z Return 0 if out of bounds, 1 if successful int *get_vec_start(vec_ptr v) z Return pointer to start of vector data
Similar to array implementations in Pascal, ML, Java z E.g., always do bounds checking
void combine1(vec_ptr v, int *dest) { int i; *dest = 0; for (i = 0; i < vec_length(v); i++) { int val; get_vec_element(v, i, &val); *dest += val; } }
Compute sum of all elements of vector
Store result at destination location
Convenient way to express performance of program thatoperators on vectors or lists
Length = n
T = CPEn + Overhead* 0 900 800 700 600 500 400 300 200 100 1000 0 50 100 150 200 Elements Cycles vsum1 Slope = 4. vsum2 Slope = 3.
void combine1(vec_ptr v, int *dest) { int i; *dest = 0; for (i = 0; i < vec_length(v); i++) { int val; get_vec_element(v, i, &val); *dest += val; } }
Compute sum of all elements of integer vector
Store result at destination location
Vector data structure and operations defined via abstract datatype
42.06 (Compiled -g) 31.25 (Compiled -O2)
Move call to vec_length out of inner loop z Value does not change from one iteration to next z Code motion
CPE: 20.66 (Compiled -O2) z vec_length requires only constant time, but significant overhead void combine2(vec_ptr v, int *dest) { int i; int length = vec_length(v); *dest = 0; for (i = 0; i < length; i++) { int val; get_vec_element(v, i, &val); *dest += val; } }
void lower(char *s) { int i; for (i = 0; i < strlen(s); i++) if (s[i] >= 'A' && s[i] <= 'Z') s[i] -= ('A' - 'a'); }
Extracted from 213 lab submissions, Fall, 1998
void lower(char *s) { int i = 0; if (i >= strlen(s)) goto done; loop: if (s[i] >= 'A' && s[i] <= 'Z') s[i] -= ('A' - 'a'); i++; if (i < strlen(s)) goto loop; done: } strlen executed every iteration strlen linear in length of string z Must scan string until finds '\0' Overall performance is quadratic
void lower(char *s) { int i; int len = strlen(s); for (i = 0; i < len; i++) if (s[i] >= 'A' && s[i] <= 'Z') s[i] -= ('A' - 'a'); }
Move call to strlen outside of loop
Since result does not change from one iteration to another
Form of code motion