Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Optimizing Vector Operations: Understanding Compiler Limitations and Techniques, Lab Reports of Computer Science

Drexel University Computer Science

The optimization of vector operations, focusing on the limitations of compilers and techniques to improve performance. Topics include code motion, vector abstract data types (adt), optimization examples, time scales, cycles per element, and procedure calls. The document also discusses the importance of understanding compiler capabilities and limitations.

Typology: Lab Reports

Pre 2010

Uploaded on 08/19/2009

koofers-user-95a 🇺🇸

9 documents

1 / 34

This page cannot be seen from the preview

Don't miss anything!

15-213

“The course that gives CMU its Zip!”

Code Optimization I:

Machine Independent Optimizations

Sept. 26, 2002

Code Optimization I:

Machine Independent Optimizations

Sept. 26, 2002

Topics

Machine-Independent Optimizations

zCode motion

zReduction in strength

zCommon subexpression sharing

Tuning

zIdentifying performance bottlenecks

class10.ppt

Discover Lab Reports of Computer Science Drexel University

Partial preview of the text

Download Optimizing Vector Operations: Understanding Compiler Limitations and Techniques and more Lab Reports Computer Science in PDF only on Docsity!

“The course that gives CMU its Zip!”

Code Optimization I:

Machine Independent Optimizations

Sept. 26, 2002

Code Optimization I:

Machine Independent Optimizations

Sept. 26, 2002

Topics^ Topics

Machine-Independent Optimizations z Code motion z Reduction in strength z Common subexpression sharing

Tuning z Identifying performance bottlenecks class10.ppt

Great Reality #4^ Great Reality #4 There’s more to performance than asymptotic^ There’s more to performance than asymptotic

complexity^ complexity

Constant factors matter too!^ Constant factors matter too!

Easily see 10:1 performance range depending on how codeis written

Must optimize at multiple levels: z algorithm, data representations, procedures, and loops

Must understand system to optimize performance^ Must understand system to optimize performance

How programs are compiled and executed

How to measure program performance and identifybottlenecks

How to improve performance without destroying codemodularity and generality

Limitations of Optimizing Compilers^ Limitations of Optimizing CompilersOperate Under Fundamental Constraint^ Operate Under Fundamental Constraint

Must not cause any change in program behavior under anypossible condition Often prevents it from making optimizations when would only affectbehavior under pathological conditions. Behavior that may be obvious to the programmer can be^ Behavior that may be obvious to the programmer can be obfuscated b y languages and coding styles obfuscated b y languages and coding styles e.g., data ranges may be more limited than variable types suggest Most analysis is performed only within procedures^ Most analysis is performed only within procedures

whole-program analysis is too expensive in most cases Most analysis is based only on^ Most analysis is based only on staticstatic informationinformation compiler has difficulty anticipating run-time inputs When in doubt, the compiler must be conservative^ When in doubt, the compiler must be conservative

Machine-Independent Optimizations^ Machine-Independent Optimizations

Optimizations you should do regardless of processor /compiler

Code Motion^ Code Motion

Reduce frequency with which computation performed z If it will always produce same result z Especially moving code out of loop for (i = 0; i < n; i++) { _int ni = ni;_* for (j = 0; j < n; j++) a[ni

j] = b[j]; } for (i = 0; i < n; i++) for (j = 0; j < n; j++) a[n*i + j] = b[j];

Reduction in Strength^ Reduction in Strength^

Replace costly operation with simpler one

Shift, add instead of multiply or divide 16*x --> x << 4 z Utility machine dependent z Depends on cost of multiply or divide instruction z On Pentium II or III, integer multiply only requires 4 CPU cycles

Recognize sequence of products for (i = 0; i < n; i++) for (j = 0; j < n; j++) a[n*i + j] = b[j]; int ni = 0; for (i = 0; i < n; i++) { for (j = 0; j < n; j++) a[ni

j] = b[j]; ni += n; }

Make Use of Registers^ Make Use of Registers

Reading and writing registers much faster thanreading/writing memory

Limitation^ Limitation

Compiler not always able to determine whether variable canbe held in register

Possibility of Aliasing

See example later

Vector ADT^ Vector ADT

length^ data

• • 0 1 2 length–

Procedures^ Procedures

vec_ptr new_vec(int len) z Create vector of specified length int get_vec_element(vec_ptr v, int index, int *dest) z *Retrieve vector element, store at dest z Return 0 if out of bounds, 1 if successful int *get_vec_start(vec_ptr v) z Return pointer to start of vector data

Similar to array implementations in Pascal, ML, Java z E.g., always do bounds checking

Optimization Example^ Optimization Example

void combine1(vec_ptr v, int *dest) { int i; *dest = 0; for (i = 0; i < vec_length(v); i++) { int val; get_vec_element(v, i, &val); *dest += val; } }

Procedure^ Procedure

Compute sum of all elements of vector

Store result at destination location

Cycles Per Element^ Cycles Per Element

Convenient way to express performance of program thatoperators on vectors or lists

Length = n

T = CPEn + Overhead* 0 900 800 700 600 500 400 300 200 100 1000 0 50 100 150 200 Elements Cycles vsum1 Slope = 4. vsum2 Slope = 3.

Optimization Example^ Optimization Example

void combine1(vec_ptr v, int *dest) { int i; *dest = 0; for (i = 0; i < vec_length(v); i++) { int val; get_vec_element(v, i, &val); *dest += val; } }

Procedure^ Procedure

Compute sum of all elements of integer vector

Store result at destination location

Vector data structure and operations defined via abstract datatype

Pentium II/III Performance: Clock Cycles / Element^ Pentium II/III Performance: Clock Cycles / Element

42.06 (Compiled -g) 31.25 (Compiled -O2)

Move

vec_length

Call Out of Loop

Move

vec_length

Call Out of Loop

Optimization^ Optimization

Move call to vec_length out of inner loop z Value does not change from one iteration to next z Code motion

CPE: 20.66 (Compiled -O2) z vec_length requires only constant time, but significant overhead void combine2(vec_ptr v, int *dest) { int i; int length = vec_length(v); *dest = 0; for (i = 0; i < length; i++) { int val; get_vec_element(v, i, &val); *dest += val; } }

Code Motion Example #2^ Code Motion Example

void lower(char *s) { int i; for (i = 0; i < strlen(s); i++) if (s[i] >= 'A' && s[i] <= 'Z') s[i] -= ('A' - 'a'); }

Procedure to Convert String to Lower Case^ Procedure to Convert String to Lower Case

Extracted from 213 lab submissions, Fall, 1998

Convert Loop To Goto Form^ Convert Loop To Goto Form

void lower(char *s) { int i = 0; if (i >= strlen(s)) goto done; loop: if (s[i] >= 'A' && s[i] <= 'Z') s[i] -= ('A' - 'a'); i++; if (i < strlen(s)) goto loop; done: } strlen executed every iteration strlen linear in length of string z Must scan string until finds '\0' Overall performance is quadratic

Improving Performance^ Improving Performance

void lower(char *s) { int i; int len = strlen(s); for (i = 0; i < len; i++) if (s[i] >= 'A' && s[i] <= 'Z') s[i] -= ('A' - 'a'); }

Move call to strlen outside of loop

Since result does not change from one iteration to another

Form of code motion

Optimizing Vector Operations: Understanding Compiler Limitations and Techniques, Lab Reports of Computer Science

Related documents

Partial preview of the text

Download Optimizing Vector Operations: Understanding Compiler Limitations and Techniques and more Lab Reports Computer Science in PDF only on Docsity!

“The course that gives CMU its Zip!”

Code Optimization I:

Machine Independent Optimizations

Sept. 26, 2002

Code Optimization I:

Machine Independent Optimizations

Sept. 26, 2002

Topics^ Topics

Great Reality #4^ Great Reality #4 There’s more to performance than asymptotic^ There’s more to performance than asymptotic

complexity^ complexity

Constant factors matter too!^ Constant factors matter too!

Must understand system to optimize performance^ Must understand system to optimize performance

Limitations of Optimizing Compilers^ Limitations of Optimizing CompilersOperate Under Fundamental Constraint^ Operate Under Fundamental Constraint

Machine-Independent Optimizations^ Machine-Independent Optimizations

Code Motion^ Code Motion

Reduction in Strength^ Reduction in Strength^ 

Make Use of Registers^ Make Use of Registers

Limitation^ Limitation

Vector ADT^ Vector ADT

Procedures^ Procedures

Optimization Example^ Optimization Example

Procedure^ Procedure

Cycles Per Element^ Cycles Per Element

Optimization Example^ Optimization Example

Procedure^ Procedure

Pentium II/III Performance: Clock Cycles / Element^ Pentium II/III Performance: Clock Cycles / Element

Move

vec_length

Call Out of Loop

Move

vec_length

Call Out of Loop

Optimization^ Optimization

Code Motion Example #2^ Code Motion Example

Procedure to Convert String to Lower Case^ Procedure to Convert String to Lower Case

Convert Loop To Goto Form^ Convert Loop To Goto Form

Improving Performance^ Improving Performance

Reduction in Strength^ Reduction in Strength^