Slides on Classical, ILP Optimization - Advanced Compilers | EECS 583, Study notes of Electrical and Electronics Engineering

Material Type: Notes; Professor: Mahlke; Class: Advanced Compilers; Subject: Electrical Engineering And Computer Science; University: University of Michigan - Ann Arbor; Term: Winter 2003;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-893
koofers-user-893 ๐Ÿ‡บ๐Ÿ‡ธ

10 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
EECS 583 โ€“ Lecture 13
Classical, ILP Optimization
University of Michigan
February 19, 2003
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download Slides on Classical, ILP Optimization - Advanced Compilers | EECS 583 and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

EECS 583 โ€“ Lecture 13 Classical, ILP Optimization

University of Michigan February 19, 2003

  • 1 -

Constant Combining

Y

Combine 2 dependent ops into 1by combining the literals

r1 = r2 + 4

r5 = r1 - 9

ร†

r5 = r2 โ€“ 5

Y

First op often becomes dead

Y

Rules (ops X and Y in same BB)

X is of the form rx +- K

dest(X) != src1(X)

Y is of the form ry +- K(comparison also ok)

Y consumes dest(X)

src1(X) not modified in (Xโ€ฆY)

r1 = r2 + 4 r3 = r1 < 0 r2 = r3 + 6 r7 = r1 โ€“ 3 r8 = r7 + 5

  • 3 -

Loop Optimizations

Y

The most important set of optimizations

Because programs spend so much time in loops

Y

Optimize given that you know a sequence of code will berepeatedly executed

Y

Optis

Invariant code removal

Global variable migration

Induction variable strength reduction

Induction variable elimination

  • 4 -

Recall Loop Terminology

  • r1, r4 are basic induction variables - r7 is a derived induction variable

r1 = 3

r2 = 10

loop preheader

r4 = r4 + 1 r7 = r4 * 3

r2 = 0

r3 = r2 + 1

r1 = r1 + 2

store (r1, r3)

loop header

exit BB

backedge BB

  • 6 -

Global Variable Migration

Y

Assign a global variabletemporarily to a register for theduration of the loop

Load in preheader

Store at exit points

Y

Rules

X is a load or store

address(X) not modified in theloop

if X not executed on everyiteration, then X must provablynot cause an exception

All memory ops in loop whoseaddress can equal address(X)must always have the sameaddress as X

r4 = load(r5)

r4 = r4 + 1

r8 = load(r5)

r7 = r8 * r

store(r5, r4)

store(r5,r7)

  • 7 -

Class Problem

Optimize this applying 1. loop invariant removal 2. global variable migration + other optis

r2 = 10

r7 = r4 * r

r6 = load(r10)

r3 = 1 / r

r3 = r4 * r r3 = r3 + r

r2 = r2 + 1

store(r10,r3) store (r2, r3)

  • 9 -

Induction Variable Elimination

Y

Remove unnecessary basicinduction variables from theloop by substituting uses withanother BIV

Y

Rules (same init val, same inc)

Find 2 basic induction varsx,y

x,y in same family

y

incremented in same places

increments equal

initial values equal

x not live when you exit loop

for each BB where x isdefined, there are no uses of xbetween first/last defn of xand last/first defn of y

r1 = r1 - 1r2 = r2 - 1

r9 = r2 + r

r7 = r1 * r

r4 = load(r1)

store(r2, r7)

  • 10 -

Induction Variable Elimination (2)

Y

5 variants discussed in thesis

1. Trivial โ€“ induction variable that is never used except by theincrements themselves, not live at loop exit

2. Same increment, same initial value

3. Same increment, initial values are a known constant offsetfrom one another

4. Same increment, no nothing about relation of initial values

5. Different increments, no nothing about initial values

Y

The higher the number, the more complex the elimination

Also, the more expensive it is

1,2 are basically free, so always should be done

3-5 require preheader operations

  • 12 -

ILP Optimization

Y

Traditional optimizations

Redundancy elimination

Reducing operation count

Y

ILP (instruction-level parallelism) optimizations

Increase the amount of parallelism and the ability to overlapoperations

Operation count is secondary, often trade parallelism for extrainstructions (avoid code explosion)

Y

ILP increased by breaking dependences

True or flow = read after write dependence

False or (anti/output) = write after read, write after write

  • 13 -

Register Renaming

Y

Remove dependences causedby variable re-use

Re-use of source variables

Re-use of temporaries

Anti, output dependences

Y

Create a new variable to holdeach unique life time

Y

Very simple transformationwith straight-line code

Make each def a uniqueregister

Substitute new name intosubsequent uses

a: r1 = r2 + r3b: r3 = r4 + r5c: r1 = r7 * r8d: r7 = r1 + r5e: r1 = r3 + 4f: r4 = r7 + 4

a: r1 = r2 + r3b: r13 = r4 + r5c: r11 = r7 * r8d: r17 = r11 + r5e: r21 = r13 + 4f: r14 = r17 + 4

  • 15 -

Rename with Copy

Y

Renaming within a web

The worst case is a web spansall defs/uses

Want to enable some of thedefs within the web to bereordered or executed inparallel

Y

Xform

Rename def

Rename uses for which def isthe the only reaching def

Insert copy

y

orig_dest = new_dest

y =

= y

y = = y

= y

y =

= y

= y

  • 16 -

Predicate Promotion

Y

Predicate promotion or predicatespeculation

Remove dependence betweenCMPP and predicated operation

Modify predicate of an operationto an ancestor predicate

Operation executes more oftenthan it should, โ€œspeculatedโ€

Y

x = โ€ฆ if p

ร†

if p

Where p2 is an ancestor of p

Legal if x not live on p2 โ€“ p

And, op will not cause aspurious exception

r1 = r2 + r3 r7 = 0 p1,p2 = CMPP.UN.UC(r1 < r5) r4 = r5 * r6 if p1 r7 = r8 + r9 if p2 r10 = r4 + 4 if p1 r11 = r7 + 1 if T

  • 18 -

Class Problem

  1. Promote everything to its highest predicate w/o renaming 2. Promote any defs of r1, r2 that remain predicated to True

using promotion with renaming

r1 = 0 if T p1 = CMPP.UN(r3 < r4) if T r2 = r6 + 3 if p1 p2,p3 = CMPP.UN.UC(r5 < r6) if p1 r1 = r5 + 1 if p2 r10 = r2 + r3 if p2 r1 = r3 * 3 if p3 r11 = load(r1) if p3 store (r1, r10) if T store (r3, r11) if T

  • 19 -

Back Substitution

Y

Generation of expressions bycompiler frontends is verysequential

Account for operatorprecedence

Apply left-to-right withinsame precedence

Y

Back substitution

Create larger expressions

y

Iteratively substitute RHSexpression for LHS variable

Note โ€“ may correspond tomultiple source statements

Enable subsequent optis

Y

Optimization

Re-compute expression in amore favorable manner

y = a + b + c โ€“ d + e โ€“ f;

r9 = r1 + r2 r10 = r9 + r3 r11 = r10 - r4 r12 = r11 + r5 r13 = r12 โ€“ r

Subs r12:

r13 = r11 + r5 โ€“ r

Subs r11:

r13 = r10 โ€“ r4 + r5 โ€“ r

Subs r

r13 = r9 + r3 โ€“ r4 + r5 โ€“ r

Subs r

r13 = r1 + r2 + r3 โ€“ r4 + r5 โ€“ r