Download Lecture Slides on Data Flow Analysis | CMSC 631 and more Study notes Computer Science in PDF only on Docsity!
Data flow analysis
Abstract Syntax Trees
x = ab; y = ab; while (y > a+b) { a = a+1; x = a+b; } StmtList x = ab StmtList y = ab StmtList whileStmt y > a+b StmtList a = a+1 StmtList x = a+b StmtList AST stopped at statement/expression level for brevity
Control Flow Graph
x = ab; y = ab; while (y > a+b) { a = a+1; x = a+b; } Entry x = ab y = ab if y > a+b (^) Exit a = a+ x = a+b
Choosing a representation
- Control flow graph is more general
- AST allows for more efficient algorithms
- but new programming constructs require changing the algorithm - e.g., continue, break, switch, try-catch-finally, goto
- program transformations may not leave the program in AST form
- bytecode/machine code isn’t in AST form
- although you may be able to recover it
Data flow analysis
- A framework for proving facts about a program - reasoning about lots of little facts - little or no interaction between facts - based on all paths through program - including infeasible paths - e.g., which assignments to x can be seen at this read of x?
Reaching definitions
- Each assignment to a variable is a definition
- defs(v) represents the set of all definitions of v
- Assume all variables are scalars
Computing In(S)
- If S has one predecessor P, In(S) = Out(P)
- Otherwise,
- In(S) = meet (^) P in Pred(S) Out(P)
- The meet function defines how to combine alternatives
- For reaching definitions, meet = union
iterative solution
- For control flow graphs with cycles, can’t directly solve the equations - compute final answer for values in terms of other final values already known
- Use iterative solution
- Can compute dataflow values is any order
- some orders are more efficient than others
- computation will converge to right answer
Initial value
- For iterative solution
- might need Out(S) before we get a chance to compute In(S)
- Need an initial value for Out(S) of all statements other than Entry
Control Flow Graph
parameter a; parameter b; x = ab; y = ab; while (y > a+b) { a = a+1; x = a+b; } defs(x) = {1,4} defs(y) = {2} defs(a) = {0,3} defs(b) = {0} Entry 3: x = ab 4: y = ab if y > a+b Exit 5: a = a+ 6: x = a+b 1: parameter a 2: parameter b {} {1} {1,2} {1,2,3} {1,2,3,4} (^) {1,2,3,4} {1,2,3,4} {2,4,5,6} {2,3,4,5} {} {1,2,3,4,5,6} {1,2,3,4,5,6} {2,3,4,5,6}
More control flow programs
- Definitely uninitialized variables
- Possibly uninitialized variables
- compare with definitely initialized variables
- What is Gen and Kill?
- What is Out(Entry)?
- What is the meet function?
- What is the initial value?
Available expressions
- An expression e is available at point p if on all paths to p, e must have been computed and since that computation, none of the variables in e have been modified - i.e., computation of e here would be redundant
- Gen( x = a+b ) = { a+b } - Kill( x = a+b )
- Kill( x = a+b ) = any expression using x
Questions
- Does it terminate?
- Does it compute a valid answer?
Definitions
- Meet function:!
- Meet function is commutative and associative
- x! x = x
- Unique bottom ⊥ and top # element
Ordering
- x $ y if and only if x! y = x
- A function f is monotone if forall x and y,
- x $ y implies f(x) $ f(y)
Lattice example
111 011 101 110 001 010 100 000 meet is bit-vector logical and
⊥
Relating to data flow analysis
- Top is value to initialize non-entry nodes to - the identity element for the meet function
- If node function is monotone
- each re-evaluation of a node moves down the lattice, if it moves at all
- If height of lattice is finite, must terminate
Is it accurate?
- We want the meet over all paths solution
- MOP( B ) = meet (^) p in Path(Entry, B ) f p (Init)
- note that Paths can be infinite if there are loops
- As good as we can do given the framework
- Iterative analysis computes Maximum Fixed Point solution - largest solution, ordered by $, that is a fixed point of the iterative computation - bottom is also a fixed point, but often not maximal
We know x! y $ x since f is monotone, f(x! y) $ f(x) which means f(x! y)! f(x) = f(x! y) and f(x! y)! f(y) = f(x! y) f(x! y)! f(x)! f(y) = f(x! y)! f(y) = f(x! y) Entry a b c MeetOverAllPaths(dIn) = fc(fa(EntryOut))! fc(fb(EntryOut)) d MaximalFixedPoint(dIn) = fc(fa(EntryOut)! fb(EntryOut)))
Distributive problems
- For a distributive problem
- you can push transfer functions over meets without causing any reduction in accuracy
- Which problems are distributive?
- reaching definitions, very busy expressions, live variables, available expressions
- Which are not?
- most formulations of constant propagation
Constant propagation
Entry x = 1 x = - y = x*x
All Gen/Kill problems are
distributive
- If OutS = GenS union InS - KillS
- Problem is distributive
- left at exercise for the reader
- and/or exam question
Are all problems monotone?
- No, you have to be careful
- Consider constant propagation of truth values - What is the rule for if x then y else z