Data-Flow Analysis and Common Subexpressions in Compiler Optimization - Prof. Chao Wen Tse, Study notes of Computer Science

Data-flow analysis and common subexpression optimization (cse) in the context of compiler optimization. It covers the concept of directed acyclic graphs (dags) and their role in exposing common subexpressions. The document also explores the challenges of handling assignments and goes into detail about the data-flow analysis algorithm and its limitations.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-7pd-1
koofers-user-7pd-1 🇺🇸

5

(2)

9 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Local optimizations
Consider the expression: a+a*(b-c)+(b-c)*d
Tree Directed acyclic graph
+
+
a *
*
a -
b c
-
b c
d
+
+ *
*
a -
b c
d
CMSC 430 Lecture 14, Page 1
Local optimizations
Common subexpressions (CSE)
portion of expressions
repeated multiple times
computes same value
can reuse previously computed value
Directed acyclic graph (DAG)
program representation
nodes can have multiple parents
no cycles allowed
exposes common subexpressions
Building a DAG for an expression
maintain hash table for leafs, expressions
unique name for each node its value number
reuse nodes found in hash table
CMSC 430 Lecture 14, Page 2
Directed acyclic graphs
What about assignment ?
complicates detection of common subexpressions
identical expression different value
must ensure each value has a unique node
One solution - renaming
add subscripts to variable names (e.g., xxi)
increment subscript of name if target (LHS) of assignment
variables references use new subscript
Example
a1=a
0+b
0
Can apply to entire basic block
CMSC 430 Lecture 14, Page 3
Directed acyclic graph example
Code After Renaming
a=b+c a
0=b
0+c
0
b=a-d b
1=a
0-d
0
c=b+c c
1=b
1+c
0
d=a-d d
1=a
0-d
0
+
-
+
b
0c
0
d
0
c
1
b
1
,d1
a0
CMSC 430 Lecture 14, Page 4
pf3
pf4
pf5

Partial preview of the text

Download Data-Flow Analysis and Common Subexpressions in Compiler Optimization - Prof. Chao Wen Tse and more Study notes Computer Science in PDF only on Docsity!

Consider the expression:Local optimizations

a + a * ( b - c ) + ( b - c ) * d

Tree

Directed acyclic graph

a

a

b

c

b

c

d

a

b

c

d

CMSC 430

Lecture 14, Page 1

Common subexpressions (CSE) Local optimizations

portion of expressions

repeated multiple times

computes same value

can reuse previously computed value

Directed acyclic graph (DAG)

program representation

nodes can have multiple parents

no cycles allowed

exposes common subexpressions

Building a DAG for an expression

maintain hash table for leafs, expressions

unique name for each node — its

value number

reuse nodes found in hash table

CMSC 430

Lecture 14, Page 2

What about Directed acyclic graphs

assignment

complicates detection of common subexpressions

identical expression

different value

must ensure each

value

has a unique node

One solution - renaming

add subscripts to variable names (e.g.,

x

x i )

increment subscript of name if target (LHS) of assignment

variables references use new subscript

Example

a 1

= a

0

  • b

0

Can apply to entire basic block CMSC 430

Lecture 14, Page 3

Directed acyclic graph example

Code

After Renaming

a = b + c

a 0

= b

0

  • c

0

b = a - d

b 1

= a

0

  • d

0

c = b + c

c 1

= b

1

  • c

0

d = a - d

d 1

= a

0

  • d

0

b 0

c 0

d 0

c 1

b 1 ,d

1

a 0

CMSC 430

Lecture 14, Page 4

Going beyond basic blocksCommon subexpressions

can no longer build DAGs

must consider control flow

Examples

if (...) c = a+bpossible kill

a = ...

d = a+b

if (...)possible gen

c = a+b

d = a+b

We handle these conditions using data-flow analysis CMSC 430

Lecture 14, Page 5

Data-flow analysis Data-flow analysis

compile-time

reasoning about the

run-time

flow of values in the program

represent facts about run-time behavior

represent effect of executing each basic block

propagate facts around control flow graph

Formulated as a set of simultaneous equations

sets attached to the nodes and edges

lattice to describe relation between values

usually represented as bit or bit vector

Limitations

answers must be conservative

often need to approximate information

assume all possible paths can be taken

CMSC 430

Lecture 14, Page 6

Algorithm Data-flow analysis

  1. build control flow graph

(CFG)

  1. post-processing3. propagate information around the graph2. initial (local) data gathering

if needed

if (b) then a = 1 Example control flow graph

c = a+b

else

c = a+bb = 1

C:

A:

B:

c := a+b

c := a+bb := 1

if (b)a := 1

D:

CMSC 430

Lecture 14, Page 7

Definition Available expressions

An expression is

defined

at point

p

if its value is computed at

p .

An expression is

killed

at a point

p

if one of its argument variables is

defined at

p .

an expression

e

is

available

at a point

p

in a procedure if every path

leading to

p

contains a prior definition of

e

that is not killed between its

definition and

p .

Global common subexpression elimination

If, at some definition point for

p

=

e , e

is available with name

x , we can

replace the evaluation with a reference to

x .

requires a global naming scheme

natural analog to parts of value numbering

CMSC 430

Lecture 14, Page 8

Reaching definitions

The problem:

What are the assignments (or definitions) of a variable

x

that may reach a particular reference to

x ?

Why is this useful?

Constant propagation:

a = 1

a = 2

b = 3a = 2

= b= a

Loop invariant code motion:

L:

if (...) goto Lc = b + ab = 20a = a + 4

CMSC 430

Lecture 14, Page 13

Reaching definitions

A

definition

of a variable

x

is a statement that assigns, or may assign, a

value to

x .

A definition

d

reaches

a program point

p

if

there exists

a path from the

point immediately following

d

to

p

such that

d

is not killed along that

path.

REACH(b) is the set of definitions reaching the entry of basic block

b

DEF(b) is the set of

local definitions

in

b

that reach the end of

b

KILL(b) is the set of variables killed by

b

Equations:

REACH(

b ) =

x ∈ pred

( b ) (DEF(

x ) ∪

(REACH(

x ) −

KILL(

x )))

Best case for REACH(b) =

Worse case for REACH(b) =

all definitions

CMSC 430

Lecture 14, Page 14

Definition: Live variables

A definition

d

is

live

at program point

p

if the variable

v

defined by

d

may

be used along some path in the program starting at

p

without being

redefined between

d

and

p .

Otherwise, the definition is

dead

Why is this useful?

global analysis to locate dead assignments.

b = a =

b =a =

b =

= b= a

CMSC 430

Lecture 14, Page 15

Live variables

happens later in the program.Slightly different, since information at basic block is based on what

A

backward

data-flow problem.

LIVE(b) is the set of definitions live on exit from block b.

KILL(b) is as before.

USE(b) is the set of locally exposed uses

control flow graph.succ(b) is the set of basic blocks that are immediate successors of b in the

Equations:

LIVE(

b ) =

x ∈ succ

( b ) (USE(

x ) ∪

(LIVE(

x ) −

KILL(

x )))

Best case for LIVE(b) =

Worse case for LIVE(b) =

all definitions

CMSC 430

Lecture 14, Page 16

What do these have in common?

AVAIL(

b ) =

x ∈ pred

( b ) (GEN(

x ) ∪

(AVAIL(

x ) −

KILL(

x )))

REACH(

b ) =

x ∈ pred

( b ) (DEF(

x ) ∪

(REACH(

x ) −

KILL(

x )))

LIVE(

b ) =

x ∈ succ

( b ) (USE(

x ) ∪

(LIVE(

x ) −

KILL(

x )))

Confluence Operator or Meet Function:

union or intersection

Behavior for block:

GEN and KILL

A direction:

forward (confluence over predecessors) or backward (over

successors)

Best case set value:

Worst case set value:

General equations:

IN(b) =

p ∈ pred

( b ) OUT(

p ) †

OUT(b) = GEN(b)

(IN(b) - KILL(b))

Reverse graph for backward problem.

CMSC 430

Lecture 14, Page 17

Use same framework for all data-flow problems Data-flow analysis frameworks

given local information GEN, KILL

start with some initial values for sets IN, OUT

setsiterate through nodes in the flow graph, recompute transfer functions until

stabilize

Framework has three components

Domain of values:

L

Operator for combining values:

A set of transfer functions (

L

L

F

Usefulness of unified framework

Defines a collection of properties that guarantee correctness, convergence;

analysis problemsCan describe speed of convergence and precision of result for a family of

Can re-use code to solve new analysis problems

CMSC 430

Lecture 14, Page 18

Definitions Data-flow lattices

  1. a

lattice

is a set

L

and a meet operation

such that,

a, b, c

L

(a)

a

a

a

[idempotent]

(b)

a ∧ b = b ∧ a

[commutative]

(c)

a

b

c ) = (

a

b ) ∧

c

[associative]

imposes a partial order on

L

a, b

L

(a)

a ≥ b ⇔ a ∧ b = b

(b)

a > b

a

b

and

a

6

b

  1. a lattice may have a

bottom

element

(a)

a

L,

a

(b)

a

L, a

  1. a lattice may have a

top

element

(a)

a

L,

a

a

(b)

a

L,

a

CMSC 430

Lecture 14, Page 19

Available expressions example: Data-flow lattices

let D =

x

| x

e 1 , e 2 , e 3 }}

Partial ordering

e 1 , e 2 }

vs.

e 3 }

Single lattice vs. one for each variable

CMSC 430

Lecture 14, Page 20