Data Flow Analysis in CMSC 631: Program Analysis and Understanding - Prof. Michael W. Hick, Study notes of Computer Science

An overview of data flow analysis, a technique used in compiler optimization. The structure of a compiler, the concept of control flow graph (cfg), and the use of cfgs in data flow analysis. It also discusses variations on cfgs, available expressions, and liveness analysis. From the fall 2007 edition of cmsc 631.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-ivm
koofers-user-ivm šŸ‡ŗšŸ‡ø

10 documents

1 / 27

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Data Flow Analysis
CMSC 631 — Program Analysis and
Understanding
Fall 2007
2
CMSC 631
•Source code parsed to produce AST
•AST transformed to CFG
•Data flow analysis operates on control flow graph
(and other intermediate representations)
Compiler Structure
Source
Code
Abstract
Syntax
Tree
Control
Flow
Graph
Object
Code
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b

Partial preview of the text

Download Data Flow Analysis in CMSC 631: Program Analysis and Understanding - Prof. Michael W. Hick and more Study notes Computer Science in PDF only on Docsity!

Data Flow Analysis

CMSC 631 — Program Analysis and

Understanding

Fall 2007

CMSC 631 2

• Source code parsed to produce AST

• AST transformed to CFG

• Data flow analysis operates on control flow graph

(and other intermediate representations)

Compiler Structure

Source Code

Abstract Syntax Tree

Control Flow Graph

Object Code

CMSC 631 3

Control-Flow Graph (CFG)

• A directed graph where

ā–  Each node represents a statement

ā–  Edges represent control flow

• Statements may be

ā–  Assignments x := y op z or x := op z

ā–  Copy statements x := y

ā–  Branches goto L or if x relop y goto L

ā–  etc.

CMSC 631 4

Control-Flow Graph Example

x := a + b

y := a * b

y > a

a := a + 1

x := a + b

CMSC 631 7

Graph Example with Entry and Exit

x := a + b

y := a * b

y > a

a := a + 1

x := a + b

exit

entry

CMSC 631 8

CFG vs. AST

• CFGs are much simpler than ASTs

ā–  Fewer forms, less redundancy, simple expressions

• But...AST is a more faithful representation

ā–  CFGs introduce temporaries

ā–  Lose block structure of program

• So for AST,

ā–  Easier to report error + other messages

ā–  Easier to explain to programmer

ā–  Easier to unparse to produce readable code

CMSC 631 9

• A framework for proving facts about programs

• Reasons about lots of little facts

• Little or no interaction between facts

ā–  Works best on properties about how program

computes

• Based on all paths through program

ā–  Including infeasible paths

Data Flow Analysis

CMSC 631 10

• Expression e is available at program point p if

ā–  e is computed on every path to p, and

ā–  the value of e has not changed since the last time e is

computed on p

• Optimization

ā–  If an expression is available, need not be recomputed

- (At least, if it’s still in a register somewhere)

Available Expressions

CMSC 631 13

Computing Available Expressions

{a + b}

{a + b, a * b}

{a + b, a * b}

Ƙ

{a + b}

{a + b}

{a + b}

{a + b}

x := a + b

y := a * b

y > a

a := a + 1

x := a + b

entry

exit

CMSC 631 14

Terminology

• A join point is a program point where two

branches meet

• Available expressions is a forward must problem

ā–  Forward = Data flow from in to out

ā–  Must = At join point, property must hold on all paths

that are joined

CMSC 631 15

• Let s be a statement

ā–  succ(s) = { immediate successor statements of s }

ā–  pred(s) = { immediate predecessor statements of s}

ā–  In(s) = program point just before executing s

ā–  Out(s) = program point just after executing s

• In(s) =

s′ pred(s)

Out(s′)

• Out(s) = Gen(s) (In(s) - Kill(s))

ā–  These are also called transfer functions

Data Flow Equations

CMSC 631 16

• A variable v is live at program point p if

ā–  v will be used on some execution path originating from

p...

ā–  before v is overwritten

• Optimization

ā–  If a variable is not live, no need to keep it in a register

ā–  If variable is dead at assignment, can eliminate

assignment

Liveness Analysis

CMSC 631 19

Computing Live Variables

{x}

{x, y, a}

{x, y, a}

{y, a, b}

{y, a, b}

{x, a, b}

{a, b} x := a + b

y := a * b

y > a

a := a + 1

x := a + b

{x, y, a, b}

{x, y, a, b}

CMSC 631 20

• An expression e is very busy at point p if

ā–  On every path from p, expression e is used before any

component of e is changed

• Optimization

ā–  Can hoist very busy expression computation to p

• What kind of problem?

ā–  Forward or backward?

ā–  May or must?

Very Busy Expressions

backward

must

CMSC 631 21

• A definition of a variable v is an assignment to v

• A definition of variable v reaches point p if

ā–  There is no intervening assignment to v

• Also called def-use information

• What kind of problem?

ā–  Forward or backward?

ā–  May or must?

Reaching Definitions

forward

may

CMSC 631 22

• Most data flow analyses can be classified this way

ā–  A few don’t fit: bidirectional analysis

• Lots of literature on data flow analysis

Space of Data Flow Analyses

May Must

Forward

Reaching definitions

Available expressions

Backward Live variables

Very busy expressions

CMSC 631 25

Meet and Join Operations

• is the meet or greatest lower bound operation:

• is the join or least upper bound operation:

CMSC 631 26

Lattices

• A partial order is a lattice if meet and join exist

for every pair of elements in P

• A lattice has unique elements and such that

ā– 

ā– 

• In a lattice,

• A partial order is a complete lattice if meet and join

are defined on any set S āŠ† P

CMSC 631 27

Useful Lattices

S

,āŠ†) forms a lattice for any set S

S

is the powerset of S (set of all subsets)

• If (S, ≤) is a lattice, so is (S, ≄)

ā–  I.e., lattices can be flipped

• The lattice for constant propagation

CMSC 631 28

Out(s) = Top for all statements s

ā–  // Slight acceleration: Could set Out(s) = Gen(s) (Top - Kill(s))

W := { all statements } (worklist)

repeat

Take s from W

In(s) :=

s′ pred(s)

Out(s′)

temp := Gen(s) (In(s) - Kill(s))

if (temp != Out(s)) {

Out(s) := temp

W := W succ(s)

until W =

Forward Must Data Flow Algorithm

CMSC 631 31

Forward Data Flow, Again

Out(s) = Top for all statements s

W := { all statements } (worklist)

repeat

Take s from W

temp := f s

s′ pred(s)

Out(s′)) (f s

monotonic transfer fn )

if (temp != Out(s)) {

Out(s) := temp

W := W U succ(s)

}

until W =

CMSC 631 32

Lattices (P, ≤ )

• Available expressions

ā–  P = sets of expressions

ā–  S1 āŠ“ S2 = S1 S

ā–  Top = set of all expressions

• Reaching Definitions

ā–  P = set of definitions (assignment statements)

ā–  S1 āŠ“ S2 = S1 S

ā–  Top = empty set

CMSC 631 33

Fixpoints

• We always start with Top

ā–  Most optimistic assumption

  • ā€œevery expression is available,ā€ ā€œno defns reach this pointā€

ā–  Strongest possible hypothesis

- = true of fewest number of states

• Revise as we encounter contradictions

ā–  Always move down in the lattice (with meet)

• Result: A greatest fixpoint

CMSC 631 34

Lattices (P, ≤ ), cont’d

• Live variables

ā–  P = sets of variables

ā–  S1 āŠ“ S2 = S1 U S

ā–  Top = empty set

• Very busy expressions

ā–  P = set of expressions

ā–  S1 āŠ“ S2 = S1 S

ā–  Top = set of all expressions

CMSC 631 37

Termination Revisited (cont’d)

• A descending chain in a lattice is a sequence

ā–  x0 > x1 > x2 > ...

• The height of a lattice is the length of the longest

descending chain in the lattice

• Then, dataflow must terminate in O(nk) time

ā–  n = # of statements in program

ā–  k = height of lattice

ā–  assumes meet operation takes O(1) time

CMSC 631 38

Least vs. Greatest Fixpoints

• Dataflow tradition: Start with Top, use meet

ā–  To do this, we need a complete meet semilattice with top,

of finite height

  • complete meet semilattice = meets defined for any set
  • finite height ensures termination

ā–  Computes greatest fixpoint

• Denotational semantics tradition: Start with

Bottom, use join

ā–  Computes least fixpoint

CMSC 631 39

• By monotonicity, we also have

• A function f is distributive if

Distributive Data Flow Problems

CMSC 631 40

• Joins lose no information

Benefit of Distributivity

f

h

k

g