Induction Variable Strength Reduction and Elimination in Compiler Optimization - Prof. Sco, Study notes of Electrical and Electronics Engineering

Two compiler optimization techniques: induction variable strength reduction and induction variable elimination. These techniques aim to improve the performance of loops by reducing the number of induction variables and eliminating unnecessary ones. The document also covers examples of their application and potential issues.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-xwo
koofers-user-xwo 🇺🇸

1

(1)

10 documents

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
EECS 583 – Class 11
ILP Optimization
University of Michigan
February 15, 2006
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download Induction Variable Strength Reduction and Elimination in Compiler Optimization - Prof. Sco and more Study notes Electrical and Electronics Engineering in PDF only on Docsity!

EECS 583 – Class 11ILP Optimization

University of MichiganFebruary 15, 2006

  • 1 -

Reading Material^ 

Today’s class»

"Compiler Code Transformations for Superscalar-Based High-Performance Systems", S. Mahlke et al., Supercomputing '92,Nov. 1992, pp. 808-817.

Material for the next lecture»

“Machine Description Driven Compilers for EPIC Processors”,B. Rau, V. Kathail, and S. Aditya, HP Technical Report, HPL-98-40, 1998.

  • 3 -

Induction Variable Strength Reduction 

Create basic inductionvariables from derivedinduction variables

Induction variable»

BIV (i++)^ 

»^

DIV (j = i * 4)^ 

»^

DIV can be converted into aBIV that is incremented by 4

Issues»

Initial and increment vals

»^

Where to place increments

r5 = r4 - 3r4 = r4 + 1

r7 = r4 * r

r6 = r4 << 2

  • 4 -

Induction Variable Strength Reduction (2)^ 

Rules»

X is a *, <<, + or – operation »^

src1(X) is a basic ind var »^

src2(X) is invariant »^

No other ops modify dest(X) »^

dest(X) != src(X) for all srcs »^

dest(X) is a register



Transformation»

Insert the following into the preheader^ 

new_reg = RHS(X)

»^

If opcode(X) is not add/sub, insert to thebottom of the preheader^ 

new_inc = inc(src1(X)) opcode(X) src2(X)

»^

else^ 

new_inc = inc(src1(X))

»^

Insert the following at each update ofsrc1(X)^ 

new_reg += new_inc

»^

Change X



dest(X) = new_reg

r5 = r4 - 3r4 = r4 + 1

r7 = r4 * r

r6 = r4 << 2

  • 6 -

Class Problem

Optimize this applying induction var str reductioninduction var elimination

r5 = r5 + 1r11 = r5 * 2r10 = r11 + 2 r12 = load (r10+0)

r9 = r1 << 1r4 = r9 - 10r3 = load(r4+4)r3 = r3 + 1store(r4+0, r3)r7 = r3 << 2r6 = load(r7+0)r13 = r2 - 1r1 = r1 + 1r2 = r2 + 1

r1 = 0r2 = 0

r13, r12, r6, r

liveout

  • 7 -

ILP Optimization^ 

Traditional optimizations»

Redundancy elimination

»^

Reducing operation count

ILP (instruction-level parallelism) optimizations»

Increase the amount of parallelism and the ability to overlapoperations

»^

Operation count is secondary, often trade parallelism for extrainstructions (avoid code explosion)

ILP increased by breaking dependences»

True or flow = read after write dependence

»^

False or (anti/output) = write after read, write after write

  • 9 -

Global Register Renaming



Straight-line code strategydoes not work»

A single use may havemultiple reaching defs



Web = Collection of defs/useswhich have possible valueflow between them»

Identify webs^ 

Take a def, add all uses  Take all uses, add allreaching defs  Take all defs, add all uses  repeat until stable soln

»^

Each web renamed if name isthe same as another web

x =y =

= y= x

x =y =

= y

= x

y =

= y

  • 10 -

Rename with Copy



Renaming within a web»

The worst case is a web spansall defs/uses »^

Want to enable some of thedefs within the web to bereordered or executed inparallel



Xform»

Rename def »^

Rename uses for which def isthe the only reaching def »^

Insert copy^ 

orig_dest = new_dest

y =

= y

y == y

= y

y =

= y

= y

  • 12 -

Promote with Copy^ 

Similar to rename with copy»

Promotion alone not legalbecause a live valuedestroyed



Rename destination, canpromote to any ancestor»

Might as well choose True »^

Substitute uses for which defis the only reaching def »^

Insert copy of old_dest =new_dest if original_ped »^

Again, must ensure operationwill not cause a spuriousexception

r7 = 0p1,p2 = CMPP.UN.UC(r1 < r5)r7 = load(r8) if p2r12 = r7 +1 if p2r10 = r4 + 4 if p1r11 = r7 + 1 if Tr1 = r2 + r3r7 = 0p1,p2 = CMPP.UN.UC(r1 < r5)r

= load(r8) if T

r

= r

if p

r12 = r

+1 if p

r10 = r4 + 4 if p1r11 = r7 + 1 if T

  • 13 -

Class Problem

r1 = 0 if Tp1 = CMPP.UN(r3 < r4) if Tr2 = r6 + 3 if p1p2,p3 = CMPP.UN.UC(r5 < r6) if p1r1 = r5 + 1 if p2r10 = r2 + r3 if p2r1 = r3 * 3 if p3r11 = load(r1) if p3store (r1, r10) if Tstore (r3, r11) if T

  1. Promote everything to its highest predicate w/o renaming2. Promote any defs of r1, r2 that remain predicated to True

using promotion with renaming

  • 15 -

Tree Height Reduction^ 

Re-compute expression as abalanced binary tree»

Obey precedence rules »^

Essentially re-parenthesize



Effects»

Height reduced (n terms)^ 

n-1 (assuming unit latency)  ceil(log2(n))

»^

Number of operationsremains constant »^

Cost^ 

Temporary registers “live”longer

»^

Watch out for^ 

Always ok for integerarithmetic  Floating-point – may notbe!!

r9 = r1 + r2r10 = r9 + r3r11 = r10 - r4r12 = r11 + r5r13 = r12 – r

r13 = r1 + r2 + r3 – r4 + r5 – r r1 + r

r3 – r

r5 – r

t1 = r1 + r2t2 = r3 – r4t3 = r5 – r6t4 = t1 + t2r13 = t4 + t

r

after back subs:

original:

final code:

  • 16 -

Fancier Tree Height Reduction^ 

Take advantage of literals»

Reassociate to maximizeopportunities for combiningliterals at compile time »^

Reduces amount ofcomputation

r13 = r1 + 4 + r2 - 3 + r3 - 6

r1 + r

r3 – 5 + r

after back subs:^ reassociate:r13 = r1 + r2 + r3 + (4 - 3 – 6)simplify:r13 = r1 + r2 + r3 - 5balance:

  • 18 -

Class Problem

Assume: + = 1, * = 3

0 r

0 r

0 r

1 r

2 r

0 r

operandarrival times

r10 = r1 * r2r11 = r10 + r3r12 = r11 + r4r13 = r12 – r5r14 = r13 + r

Back susbstituteRe-express in tree-height reduced form

Account for latency and arrival times

  • 19 -

Optimizing Unrolled Loops

r1 = load(r2)r3 = load(r4)r5 = r1 * r3r6 = r6 + r5r2 = r2 + 4r4 = r4 + 4if (r4 < 400) goto loop

loop:

r1 = load(r2)r3 = load(r4)r5 = r1 * r3r6 = r6 + r5r2 = r2 + 4r4 = r4 + 4r1 = load(r2)r3 = load(r4)r5 = r1 * r3r6 = r6 + r5r2 = r2 + 4r4 = r4 + 4r1 = load(r2)r3 = load(r4)r5 = r1 * r3r6 = r6 + r5r2 = r2 + 4r4 = r4 + 4if (r4 < 400) goto loop

iter1 iter2 iter

Unroll = replicate loop bodyn-1 times.Hope to enable overlap ofoperation execution fromdifferent iterationsNot possible!

loop:

unroll 3 times