



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Data-flow analysis and common subexpression optimization (cse) in the context of compiler optimization. It covers the concept of directed acyclic graphs (dags) and their role in exposing common subexpressions. The document also explores the challenges of handling assignments and goes into detail about the data-flow analysis algorithm and its limitations.
Typology: Study notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Consider the expression:Local optimizations
a + a * ( b - c ) + ( b - c ) * d
Tree
Directed acyclic graph
a
a
b
c
b
c
d
a
b
c
d
CMSC 430
Lecture 14, Page 1
Common subexpressions (CSE) Local optimizations
portion of expressions
repeated multiple times
computes same value
can reuse previously computed value
Directed acyclic graph (DAG)
program representation
nodes can have multiple parents
no cycles allowed
exposes common subexpressions
Building a DAG for an expression
maintain hash table for leafs, expressions
unique name for each node — its
value number
reuse nodes found in hash table
CMSC 430
Lecture 14, Page 2
What about Directed acyclic graphs
assignment
complicates detection of common subexpressions
identical expression
different value
must ensure each
value
has a unique node
One solution - renaming
add subscripts to variable names (e.g.,
x
→
x i )
increment subscript of name if target (LHS) of assignment
variables references use new subscript
Example
a 1
= a
0
0
Can apply to entire basic block CMSC 430
Lecture 14, Page 3
Directed acyclic graph example
Code
After Renaming
a = b + c
a 0
= b
0
0
b = a - d
b 1
= a
0
0
c = b + c
c 1
= b
1
0
d = a - d
d 1
= a
0
0
b 0
c 0
d 0
c 1
b 1 ,d
1
a 0
CMSC 430
Lecture 14, Page 4
Going beyond basic blocksCommon subexpressions
can no longer build DAGs
must consider control flow
Examples
if (...) c = a+bpossible kill
a = ...
d = a+b
if (...)possible gen
c = a+b
d = a+b
We handle these conditions using data-flow analysis CMSC 430
Lecture 14, Page 5
Data-flow analysis Data-flow analysis
compile-time
reasoning about the
run-time
flow of values in the program
represent facts about run-time behavior
represent effect of executing each basic block
propagate facts around control flow graph
Formulated as a set of simultaneous equations
sets attached to the nodes and edges
lattice to describe relation between values
usually represented as bit or bit vector
Limitations
answers must be conservative
often need to approximate information
assume all possible paths can be taken
CMSC 430
Lecture 14, Page 6
Algorithm Data-flow analysis
if needed
if (b) then a = 1 Example control flow graph
c = a+b
else
c = a+bb = 1
c := a+b
c := a+bb := 1
if (b)a := 1
CMSC 430
Lecture 14, Page 7
Definition Available expressions
An expression is
defined
at point
p
if its value is computed at
p .
An expression is
killed
at a point
p
if one of its argument variables is
defined at
p .
an expression
e
is
available
at a point
p
in a procedure if every path
leading to
p
contains a prior definition of
e
that is not killed between its
definition and
p .
Global common subexpression elimination
If, at some definition point for
p
=
e , e
is available with name
x , we can
replace the evaluation with a reference to
x .
requires a global naming scheme
natural analog to parts of value numbering
CMSC 430
Lecture 14, Page 8
Reaching definitions
The problem:
What are the assignments (or definitions) of a variable
x
that may reach a particular reference to
x ?
Why is this useful?
Constant propagation:
a = 1
a = 2
b = 3a = 2
= b= a
Loop invariant code motion:
if (...) goto Lc = b + ab = 20a = a + 4
CMSC 430
Lecture 14, Page 13
Reaching definitions
A
definition
of a variable
x
is a statement that assigns, or may assign, a
value to
x .
A definition
d
reaches
a program point
p
if
there exists
a path from the
point immediately following
d
to
p
such that
d
is not killed along that
path.
REACH(b) is the set of definitions reaching the entry of basic block
b
DEF(b) is the set of
local definitions
in
b
that reach the end of
b
KILL(b) is the set of variables killed by
b
Equations:
b ) =
⋃
x ∈ pred
( b ) (DEF(
x ) ∪
x ) −
x )))
Best case for REACH(b) =
Worse case for REACH(b) =
all definitions
CMSC 430
Lecture 14, Page 14
Definition: Live variables
A definition
d
is
live
at program point
p
if the variable
v
defined by
d
may
be used along some path in the program starting at
p
without being
redefined between
d
and
p .
Otherwise, the definition is
dead
Why is this useful?
global analysis to locate dead assignments.
b = a =
b =a =
b =
= b= a
CMSC 430
Lecture 14, Page 15
Live variables
happens later in the program.Slightly different, since information at basic block is based on what
backward
data-flow problem.
LIVE(b) is the set of definitions live on exit from block b.
KILL(b) is as before.
USE(b) is the set of locally exposed uses
control flow graph.succ(b) is the set of basic blocks that are immediate successors of b in the
Equations:
b ) =
⋃
x ∈ succ
( b ) (USE(
x ) ∪
x ) −
x )))
Best case for LIVE(b) =
Worse case for LIVE(b) =
all definitions
CMSC 430
Lecture 14, Page 16
What do these have in common?
b ) =
⋂
x ∈ pred
( b ) (GEN(
x ) ∪
x ) −
x )))
b ) =
⋃
x ∈ pred
( b ) (DEF(
x ) ∪
x ) −
x )))
b ) =
⋃
x ∈ succ
( b ) (USE(
x ) ∪
x ) −
x )))
Confluence Operator or Meet Function:
union or intersection
Behavior for block:
GEN and KILL
A direction:
forward (confluence over predecessors) or backward (over
successors)
Best case set value:
Worst case set value:
General equations:
IN(b) =
p ∈ pred
( b ) OUT(
p ) †
OUT(b) = GEN(b)
(IN(b) - KILL(b))
Reverse graph for backward problem.
CMSC 430
Lecture 14, Page 17
Use same framework for all data-flow problems Data-flow analysis frameworks
given local information GEN, KILL
start with some initial values for sets IN, OUT
setsiterate through nodes in the flow graph, recompute transfer functions until
stabilize
Framework has three components
Domain of values:
Operator for combining values:
A set of transfer functions (
Usefulness of unified framework
Defines a collection of properties that guarantee correctness, convergence;
analysis problemsCan describe speed of convergence and precision of result for a family of
Can re-use code to solve new analysis problems
CMSC 430
Lecture 14, Page 18
Definitions Data-flow lattices
lattice
is a set
and a meet operation
such that,
a, b, c
(a)
a
∧
a
a
[idempotent]
(b)
a ∧ b = b ∧ a
[commutative]
(c)
a
∧
b
∧
c ) = (
a
∧
b ) ∧
c
[associative]
imposes a partial order on
a, b
(a)
a ≥ b ⇔ a ∧ b = b
(b)
a > b
a
≥
b
and
a
b
bottom
element
(a)
a
∈
a
(b)
a
∈
L, a
top
element
(a)
a
∈
a
a
(b)
a
∈
a
CMSC 430
Lecture 14, Page 19
Available expressions example: Data-flow lattices
let D =
x
| x
e 1 , e 2 , e 3 }}
Partial ordering
e 1 , e 2 }
vs.
e 3 }
Single lattice vs. one for each variable
CMSC 430
Lecture 14, Page 20