Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Code Optimization: Techniques and Strategies for Improving Performance, Slides of Introduction to Computers

Carnegie Mellon University (CMU)Introduction to Computers

This document from a cs 213 class in 1998 covers various techniques for optimizing code, including basic optimizations like reduction in strength, code motion, and common subexpression sharing, as well as advanced optimizations such as code scheduling, unrolling, and pipelining. The document also discusses optimization blockers and provides examples and advice for optimizing performance without destroying code modularity and generality.

Typology: Slides

2010/2011

Uploaded on 10/07/2011

rolla45 🇺🇸

(6)

133 documents

1 / 33

This page cannot be seen from the preview

Don't miss anything!

Code Optimization

November 3, 1998

Topics

•Basic optimizations

–Reduction in strength

–Code motion

–Common subexpression sharing

•Optimization blockers

•Advanced optimizations

–Code scheduling

–Unrolling & pipelining

•Advice

15-213

class21.ppt

Discover Slides of Introduction to Computers Carnegie Mellon University (CMU)

Partial preview of the text

Download Code Optimization: Techniques and Strategies for Improving Performance and more Slides Introduction to Computers in PDF only on Docsity!

TopicsNovember 3, 1998Code Optimization

Basic optimizations

Common subexpression sharing– Code motion – Reduction in strength

Optimization blockers

Advanced optimizations

Unrolling & pipelining – Code scheduling

Advice

class21.ppt

CS 213 F’

class21.ppt

Great Reality

There’s more to performance than asymptotic

complexity

Constant factors matter too!

Easily see 10:1 performance range depending on how code written

procedures, and loopsMust optimize at multiple levels: algorithm, data representations,

Must understand system to optimize performance

How programs compiled and executed

How to measure program performance and identify bottlenecks

and generalityHow to improve performance without destroying code modularity

CS 213 F’

class21.ppt

Limitations of Optimizing Compilers

Work under Tight Restriction

realizable circumstanceCannot perform optimization if changes behavior under any

Even if circumstances seem quite bizarre

Have No Understanding of Application

Limited information about data ranges

Don’t always make best trade-offs

Some Don’t Try Very Hard

Increase cost of compilation

More chances for compiler errors

CS 213 F’

class21.ppt

Basic Optimizations

Reduction in Strength

Replace costly operation with simpler one

Shift, add instead of multiply or divide

Integer multiplication requires 8-16 cycles on the Alpha 21164

Procedure with no stack frame

Keep data in registers rather than memory

Pointer arithmetic

Code Motion

Reduce frequency with which computation performed

Especially moving code out of loop – If it will always produce same result

Share Common Subexpressions

Reuse portions of expressions

CS 213 F’

class21.ppt

Multiply / Divide Example

Unsigned integers, power of 2

Most possible optimizations

uweight4(unsigned void

long

(^) x,

unsigned

(^) long

(^) *dest)

dest[0]

4*x;

dest[1]

44x;

dest[2]

dest[3]

x / (^) 4;

dest[4]

x % (^) 4;

s4addq

stq

dest[0]

sll

16x

stq

dest[1]

16x

lda

mulq

-4x

stq

dest[2]

-4x

srl

x / 4

stq

dest[3]

x (^) / (^4)

and

(^) x % 4

stq

dest[4]

x (^) % (^4)

Code Sequences

CS 213 F’

class21.ppt

Multiply / Divide Example

Signed integers, power of 2

Multiplication same as for unsigned

Correct rounding of negatives for division

Shift / And combination would produce positive remainder

weight4(long int x, void

long int

*dest)

dest[3]

x / (^) 4;

dest[4]

x % (^) 4;

addq

cmovge

(x >= (^) 0),

sra

x / 4

stq

dest[3]

x (^) / (^4)

s4addq

subq

(^) x

( (^) * (x (^) / 4))

stq

dest[4]

x (^) % (^4)

Division Code

CS 213 F’

class21.ppt

Omitting Stack Frame

Reduces strength of general procedure call

Leaf Procedure

Does not call any other procedures

All Local Variables Can be Held in Registers

Not too many

No local structures or arrays

Suppose allocate array

int a[6]

as registers

» How would you generate code for

a[i]

No address operations

&x

cannot be generated if x is in register

Performance Improvements

Minor saving in stack space

Eliminates time to setup and undo stack frame

CS 213 F’

class21.ppt

Keeping Data in Registers

Computing Integer Sum

z = x + y

Integer data stored in registers

r

1 clock cycleaddq $1, $2, $

Data addresses stored in registers

r

4 clock cyclesstq $6, 0($3)addq $4, $5, $6ldq $5, 0($2)ldq $4, 0($1)

Computing Double Precision Sum

z = x

y

Register data: 4 clock cycles

Memory data: 7 clock cycles

CS 213 F’

class21.ppt

Procedure^ Memory Optimization Example (Cont.)

product

Compute product of array elements and store at

*dest

Accumulate in register

Each iteration takes ~6 cycles (roughly twice as fast)

(^) i, (^) $

(^) = vals,

(^) = cnt

(^) $f

(^) prod

Loop: s8addq

(^) &vals[i]

ldt

$f1,0($1)

(^) $f

vals[i]

mult

$f10,$f1,$f

(^) prod *=

(^) vals[i]

addq

(^) $2,1,$

(^) i++

cmplt

(^) if (^) (i

CS 213 F’

class21.ppt

Blocker #1: Memory Aliasing

Aliasing

Two different memory references specify single location

Example

double a[3] = {

product1(a, &a[2],

product2(a, &a[2],

Observations

Easy to have happen in C

Direct access to storage structures – Since allowed to do address arithmetic

Get in habit of introducing local variables

Your way of telling compiler not to check for aliasing – Accumulating within loops

CS 213 F’

class21.ppt

Pointer Code Example

Procedure

product

Compute product of array elements and store at

*dest

Each iteration takes ~5 cycles

Can’t do much better (in this case), since mult takes 4 cycles

With more functional units or lower mult latency, we could do better:

requires loop unrolling or software pipelining (discussed later)

= vals,

(^) val_end

(^) $f

(^) prod

Loop: ldt

$f1,0($16)

(^) $f

*vals

mult

(^) $f10,$f1,$f

(^) prod

(^) vals

addq

(^) $16,8,$

(^) vals++

subq

(^) $16,$2,$

if (^) (vals !=

val_end)

bne

$1,Loop

continue

(^) looping

void

product3(double vals[],

double

*dest,

(^) long int

(^) cnt)

double

*val_end

= vals+cnt;

double

prod

if (^) (cnt

*dest

prod;

return;

while}

(^) (vals

val_end)

prod

*vals++;

*dest =

prod;

CS 213 F’

class21.ppt

Code Motion

Move Computation out of Frequently Executed Section

if guaranteed to always give same result

out of loop

for

(i

0; i < n; i++)

for (^) (j

= 0; j < n;

j++)

a[n*i

j] (^) = (^) b[j];

for

(i

0; i < n; i++)

int (^) ni =

(^) n*i;

for (^) (j

= 0; j < n;

j++)

a[ni

= b[j];

CS 213 F’

class21.ppt

Code Motion Examples

Sum Integers from 1 to n!

Best Better Bad

sum

= 0;

for

(i

0; i <= fact(n); i++)

sum (^) +=

sum

= 0;

fn (^) = (^) fact(n);

for

(i

0; i <= fn;

i++)

sum (^) +=

sum

= 0;

for

(i

fact(n);

(^) i > 0; i--)

sum (^) +=

fn (^) = (^) fact(n);

sum

= fn

(^) * (^) (fn

CS 213 F’

class21.ppt

Blocker #2: Procedure Calls

Why couldn’t the compiler move fact(n) out of the inner loop?

Procedure May Have Side Effects

i.e, alters global state each time called

Function May Not Return Same Value for Given

Arguments

Depends on other parts of global state

Why doesn’t compiler look at code for fact(n)?

Linker may overload with different version

Unless declared static

Code Optimization: Techniques and Strategies for Improving Performance, Slides of Introduction to Computers

Related documents

Partial preview of the text

Download Code Optimization: Techniques and Strategies for Improving Performance and more Slides Introduction to Computers in PDF only on Docsity!

TopicsNovember 3, 1998Code Optimization

Basic optimizations

Optimization blockers

Advanced optimizations

Advice

CS 213 F’

Great Reality

Constant factors matter too!

Easily see 10:1 performance range depending on how code written

procedures, and loopsMust optimize at multiple levels: algorithm, data representations,

Must understand system to optimize performance

How programs compiled and executed

How to measure program performance and identify bottlenecks

and generalityHow to improve performance without destroying code modularity

CS 213 F’

Limitations of Optimizing Compilers

Work under Tight Restriction

realizable circumstanceCannot perform optimization if changes behavior under any

Even if circumstances seem quite bizarre

Have No Understanding of Application

Limited information about data ranges

Don’t always make best trade-offs

Some Don’t Try Very Hard

Increase cost of compilation

More chances for compiler errors

CS 213 F’

Basic Optimizations

Reduction in Strength

Replace costly operation with simpler one

Shift, add instead of multiply or divide

Procedure with no stack frame

Keep data in registers rather than memory

Pointer arithmetic

Code Motion

Reduce frequency with which computation performed

Share Common Subexpressions

Reuse portions of expressions

CS 213 F’

Multiply / Divide Example

Unsigned integers, power of 2

Most possible optimizations

Code Sequences

CS 213 F’

Multiply / Divide Example

Signed integers, power of 2

Multiplication same as for unsigned

Correct rounding of negatives for division

Shift / And combination would produce positive remainder

Division Code

CS 213 F’

Omitting Stack Frame

Reduces strength of general procedure call

Leaf Procedure

Does not call any other procedures

All Local Variables Can be Held in Registers

Not too many

No local structures or arrays

int a[6]

as registers

» How would you generate code for

a[i]

No address operations

&x

cannot be generated if x is in register

Performance Improvements

Minor saving in stack space

Eliminates time to setup and undo stack frame

CS 213 F’

Keeping Data in Registers

Computing Integer Sum

z = x + y

Integer data stored in registers

r

r

r

Data addresses stored in registers