









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of data flow analysis, a technique used in program analysis during the compilation process. The transformation of source code into abstract syntax trees (ast) and control flow graphs (cfg), the concept of data flow facts, and various data flow analyses such as forward and backward analysis. The document also discusses the importance of lattices in data flow analysis and the distinction between forward and backward problems.
Typology: Papers
1 / 17
This page cannot be seen from the preview
Don't miss anything!










Data Flow Analysis
CMSC 631 — Program Analysis and
Understanding
Spring 2009
CMSC 631 (^2)
Source
Code
Abstract
Syntax
Tree
Control
Flow
Graph
Object
Code
! (^) I.e., sequences of characters
! Awkward to work with
! (^) Use lexer (like flex) to recognize tokens
! (^) Use parser (like bison) to group words structurally
!"#$"%"&"'(
)"#$"%"*"'(
+,-./"0)"1"%2"
""""%"#$"%"&"4(
""""!"#$"%"&"'
5
Program
:=
x +
a b
while
y a
Block
:=
a +
a 1
...
...
CMSC 631 (^5)
! (^) They don’t contain all information in the program
! (^) Any ambiguity has been resolved
! In this class, we will generally begin at the AST level
CMSC 631 (^6)
! (^) E.g., for, while, repeat...until
! E.g., if, ?:, switch
! (^) (42 * y) + (z > 5? 12 * z : z + 20)
! (^) ...at least, for dataflow analysis
! (^) Each node represents a statement
! Edges represent control flow
! (^) Assignments x := y op z or x := op z
! Copy statements x := y
! (^) Branches goto L or if x relop y goto L
! etc.
!"#$"%"&"'(
)"#$"%"*"'(
+,-./"0)"1"%2"
""""%"#$"%"&"4(
""""!"#$"%"&"'
5
x := a + b
y := a * b
y > a
a := a + 1
x := a + b
CMSC 631 (^13)
! Works best on properties about how program
computes
! (^) Including infeasible paths
CMSC 631 (^14)
! (^) e is computed on every path to p, and
! the value of e has not changed since the last time e
was computed on the paths to p
! (^) If an expression is available, need not be recomputed
! (^) a + b is available
! a * b is available
! a + 1 is available
x := a + b
y := a * b
y > a
a := a + 1
x := a + b
exit
entry
Stmt Gen Kill
x := a + b a + b
y := a * b a * b
a := a + 1
a + 1,
a + b,
a * b
x := a + b
y := a * b
y > a
a := a + 1
x := a + b
exit
entry
CMSC 631 (^17)
{a + b}
{a + b, a * b}
{a + b, a * b}
Ø
{a + b}
{a + b}
{a + b}
{a + b}
x := a + b
y := a * b
y > a
a := a + 1
x := a + b
entry
exit
CMSC 631 (^18)
! Forward = Data flow from in to out
! (^) Must = At join point, property must hold on all paths
that are joined
! (^) succ(s) = { immediate successor statements of s }
! pred(s) = { immediate predecessor statements of s}
! In(s) = program point just before executing s
! Out(s) = program point just after executing s
s pred(s)
! (^) v will be used on some execution path originating
from p...
! before v is overwritten
! (^) If a variable is not live, no need to keep it in a register
! If variable is dead at assignment, can eliminate
assignment
CMSC 631 (^25)
! (^) There is no intervening assignment to v
! (^) Forward or backward?
! May or must?
forward
may
CMSC 631 (^26)
! A few don’t fit: bidirectional analysis
May Must
Forward
Reaching
definitions
Available
expressions
Backward
Live
variables
Very busy
expressions
! (^) Example: Available expressions
a+b, a*b, a+
a+b, a*b a+b, a+
a+b
a*b, a+
a*b a+
(none)
“top”
“bottom”
!
!
!
!
≤ is reflexive: x ≤ x
≤ is anti-symmetric: x ≤ y and y ≤ x ⇒ x = y
≤ is transitive: x ≤ y and y ≤ z ⇒ x ≤ z
CMSC 631 (^29)
! " is the meet or greatest lower bound operation:
! (^) " is the join or least upper bound operation:
x! y ≤ x and x! y ≤ y
if z ≤ x and z ≤ y, then z ≤ x " y
if x ≤ z and y ≤ z, then x " y ≤ z
x ≤ x " y and y ≤ x " y
CMSC 631 (^30)
!
!
x! ⊥ = ⊥
x! " = x
x! ⊥ = x
x! " = "
x ≤ y iff x " y = x
x ≤ y iff x # y = y
Out(s) = Top for all statements s
// Slight acceleration: Could set Out(s) = Gen(s) (Top - Kill(s))
W := { all statements } (worklist)
repeat
Take s from W
In(s) := s pred(s)
Out(s )
temp := Gen(s) (In(s) - Kill(s))
if (temp != Out(s)) {
Out(s) := temp
W := W succ(s)
}
until W =
! In(s) :=^ s pred(s)
Out(s )
! (^) temp := Gen(s) (In(s) - Kill(s))
! temp :=
x ≤ y ⇒ f (x) ≤ f (y)
f s
s ′ ∈pred(s)
Out(s
′ ))
a function f s
(In(s))
CMSC 631 (^37)
! (^) Every expression is available, no defns reach this point
! Most optimistic assumption
! Strongest possible hypothesis
! Always move down in the lattice (with meet)
CMSC 631 (^38)
! (^) P = sets of variables
! Top = empty set
! (^) P = set of expressions
! (^) Top = set of all expressions
Out(s) = Top for all s
W := { all statements }
repeat
! Take s from W
! temp := f s
s pred(s)
Out(s ))
! if (temp != Out(s)) {
!! Out(s) := temp
!! W := W succ(s)
until W =
In(s) = Top for all s
W := { all statements }
repeat
! Take s from W
! temp := f s
s succ(s)
In(s ))
! if (temp != In(s)) {
!! In(s) := temp
!! W := W pred(s)
until W =
temp := f s
(! s pred(s)
Out(s ))
! if (temp != Out(s)) { ... }
! Claim: Out(s) only shrinks
Then! s pred(s)
Out(s ) shrinks
Since f s
monotonic, f s
(! s pred(s)
Out(s )) shrinks
CMSC 631 (^41)
! (^) x0 $!x1 $!x2 $!...
! (^) n = # of statements in program
! k = height of lattice
! assumes meet operation takes O(1) time
CMSC 631 (^42)
! (^) To do this, we need a meet semilattice with top
! Computes greatest fixpoint
! Computes least fixpoint
f (x! y) ≤ f (x)! f (y)
f (x! y) = f (x)! f (y)
f g
h
k
k(h(f (!) " g(!))) =
k(h(f (!)) " h(g(!))) =
k(h(f (!))) " k(h(g(!)))
CMSC 631 (^49)
! (^) No statement except the last in a branch
! There are no branches to any statement in the block
except the first
! Compute Gen/Kill for each basic block
! (^) Store only In/Out for each basic block
! Typical basic block ~5 statements
CMSC 631 (^50)
! (^) Let G = (V, E) be the CFG
! Let k be the height of the lattice
! (^) Visit head before tail of edge
! No matter what size the lattice
! (^) Order from depth-first search
! Nesting depth
! Back edge is from node to ancestor on DFS tree
! Running time is
∀x.f (x) ≤ x
! (^) The order of statements is taken into account
! I.e., we keep track of facts per program point
! (^) Analysis the same regardless of statement order
! Standard example: types
CMSC 631 (^53)
! (^) (Not always followed in literature)
CMSC 631 (^54)
! (^) “Collapse” larger constructs into smaller ones,
combining data flow equations
! Eventually program collapsed into a single node!
! (^) “Expand out” back to original constructs, rebuilding
information
f
f
f ite
= (f then
◦ f if
) " (f else
◦ f if
Out(if) = f if
(In(ite)))
Out(then) = (f then
◦ f if
)(In(ite)))
Out(else) = (f else
◦ f if
)(In(ite)))
If
Then Else
z
IfThenElse
z
CMSC 631 (^61)
! (^) Not quite as nice (regions are usually single entry but
often not single exit )
! Easy to compose functions, compute meet, etc.
! Not really the case
CMSC 631 (^62)
! (^) Lots of proposed solutions in data flow analysis
literature
! (^) Call to function kills all data flow facts
! May be able to improve depending on language, e.g.,
function call may not affect locals
! (^) But what about values stored in the heap?
! Not modeled in traditional data flow
! (^) Assume all data flow facts killed (!)
! Or, assume write through x may affect any variable
whose address has been taken
CMSC 631 (^65)
! Not so much bang for the buck!
CMSC 631 66
! For Java. The first three are for C.
! Memory leak detection
! (^) Security vulnerability checking (tainting, info. leaks)