Download Program Analysis and Understanding - Slides | CMSC 631 and more Study notes Computer Science in PDF only on Docsity!
CMSC 631 – Program Analysis and Understanding Fall 2007 CMSC 631 2
- Three main focus areas: ■ Formal systems and notations - Vocabulary for talking about programs ■ Static analysis - Automatic reasoning about source code ■ Programming language features - Affects programs and how we reason about them
Analyzing & understanding software
CMSC 631 3
• Instructor: Michael Hicks
■ Office: 4131 AVW ■ E-mail: [email protected] ■ Office hours: TWF 10am-11am
• Grader: Brent Gordon
Personnel
CMSC 631 4
• CMSC 430 or equivalent compiler class
■ Ideas we will use in this class:
- Parse trees/abstract syntax trees
- BNF notation for grammars
- Type checking (usually little coverage in a compilers class)
- Data flow analysis (coverage varies in a compilers class)
- Tools like yacc and lex may be useful for your project ■ We won’t use most of the other material
- So even without taking a compilers class, you may be OK
- Talk to me if you’re not sure
Prerequisite
CMSC 631 7 Expectations: Homework
- First half of class: two kinds of assignments ■ Programming assignments (20% of grade) - Every two weeks - Implement the ideas we see in lecture ■ Written assignments (10% of grade) - Every week - Short problem sets
- This is how you will learn things ■ Much more effective than (just) listening to a lecture CMSC 631 8 Late Policy on Assignments
- Programming Assignments: Due at midnight ■ We use Marmoset for submissions
- http://submit.cs.umd.edu
- Written assignments: Due at start of class ■ No late submissions
- Contact me about extenuating circumstances ■ E.g., religious holidays ■ Inform me as soon as possible
CMSC 631 9
- Will need to read some papers for class ■ More during the second half of the semester ■ Should come prepared to contribute to discussion
- (Possible) student presentations later in the
semester
■ Read 1-2 papers on a topic ■ Present a lecture in class about the material
- 10% of grade on class participation Expectations: Participation CMSC 631 10
- Class goal: Teach you how to do research ■ So you have to do research as part of the class
- Substantial research project (35% of grade) ■ Any topic vaguely related to the class
- Will post some suggestions for projects later on
- May also be able to share project with other class ■ Completed in groups of size 2 (possibly 1 or 3)
- This will consume second-half of semester Expectations: Project
CMSC 631 13
- Don’t do it Academic Dishonesty CMSC 631 14
- http://www.cs.umd.edu/projects/softchat
- Weekly meeting about PL and SE research
- Mondays at 11am in 3258 AVW this fall ■ Starting Sep. 10
- Topics include ■ Current research in the department ■ Practice talks ■ Interesting recent papers Software Chat
20 Ideas and Applications in Program Analysis in 40 Minutes CMSC 631 – Program Analysis and Understanding Fall 2007 CMSC 631 16
- Rice’s Theorem: Any non-trivial property of
programs is undecidable
■ Uh-oh! We can’t do anything. So much for this course...
- Need to make some kind of approximation ■ Abstract the behavior of the program ■ ...and then analyze the abstraction
- Seminal papers: Cousot and Cousot, 1977, 1979 Abstract Interpretation
CMSC 631 19 Control-Flow Graph x = * x = 3 x = 3 x = 3 x = 3 x = 6 x =? x =? x =? CMSC 631 20
- Dataflow facts form a lattice
- Each statement has a transformation function ■ Out(S) = Gen(S) U (In(S) - Kill(S))
- Terminates because ■ Finite height lattice ■ Monotone transformation functions Lattices and Termination x =? x = 3 x = 6 ... x = *
CMSC 631 21
- Three syntactic forms ■ variable ■ function ■ function application
- One reduction rule ■ → (replace by in )
- Can represent any computable function! Lambda Calculus CMSC 631 22
- Conditionals ■ true = false = ■ if a then b else c =
- if true then b else c = → →
- if false then b else c = → →
- Can also represent numbers, pairs, data
structures, etc, etc.
- Result: Lingua franca of PL Example
CMSC 631 25 Operational Semantics
- Evaluation is depicted as operationally , as part of
some abstract machine
■ Program states are reduced according to some transition relation →. An example is our lambda calculus rule: ■ →
- There are different styles of abstract machine ■ Small-step (as above), big-step (a.k.a. natural semantics ), SECD machine …
- The meaning of a program is its fully reduced
form (a.k.a. a value )
CMSC 631 26 Denotational Semantics
- The meaning of a program is defined as a
mathematical object, like a function or number
■ Rather than a sequence of machine states
- The semantics is given in terms of an interpretation
function [|.|]
■ Takes program fragment as its argument and returns its meaning as the result, e.g., as a mathematical object
- Things get interesting when trying to define
denotations for recursive constructs
CMSC 631 27 Denotational Semantics example
- b ::= true | false | b ∨ b | b ∧ b
- e ::= 0 | 1 | … | e + e | e * e
- s ::= e | if b then s else s ■ [| true |] = true ■ [| b1 ∨ b2 |] = [| b1 |] or [| b2 |] ■ [| if b then s1 else s2 |] = ■ How would we handle a while loop? [|s1|] iff [|b|] holds [|s2|] iff [|b|] does not hold CMSC 631 28
- With the aforementioned semantics, we define
the behavior of programs, and then reason about
programs in terms of this behavior
■ Are two programs equivalent? Does a program terminate? Does a program implement a particular specification?
- Alternately, axiomatic semantics define the
meaning as what one can prove about it
■ Hoare, Dijkstra, Gries, others Axiomatic Semantics
CMSC 631 31
- τ
- τ τ → τ
- τ in type environment , expression has type τ Simply-typed λ-calculus dom(A)
CMSC 631 32
- Liskov: ■ If for each object of type there is an object of type such that for all programs defined in terms of , the behavior of is unchanged when is substituted for then is a subtype of.
- Informal statement ■ If anyone expecting a can be given an instead, then is a subtype of. Subtyping
CMSC 631 33
- Control-flow analysis
- CFL reachablity and polymorphism
- Constraint-based analysis
- Alias and pointer analysis
- Region-based memory management
- Garbage collection
- Theorem proving
- More … Other Technologies and Topics CMSC 631 34
- Polyspace ■ Looks for race conditions, out-of-bounds array accesses, null pointer dereferences, non-initialized data access, etc. ■ Also includes arithmetic equation solver
- ASTREE ■ Used to detect all possible runtime failures (divide by zero, null pointer dereference, array out-of-bounds access) on embedded codes ■ Used regularly on Airbus Avionics software
- Stacktool ■ Abstractly interprets machine code to check for possible stack overflow in embedded systems
Applications: Abstract Interpretation
CMSC 631 37
- Type qualifiers ■ Format-string vulnerabilities, deadlocks, file I/O protocol errors, kernel security holes
- Vault and Cyclone ■ Memory allocation and deallocation errors, library protocol errors, misuse of locks Applications: Type Systems CMSC 631 38
- Twelf, Coq, Isabelle/HOL ■ Propositions can be expressed as types, and their proofs are expressed as terms having that type ■ Proposition: A → A, Proof: x:A.x ■ Type checking thus becomes proof checking ■ Can be used for more convincing formal proofs, or even for proof-carrying code Applications: Proof Assistants
CMSC 631 39
- PL has a great mix of theory and practice ■ Very deep theory ■ But lots of practical applications
- Recent exciting new developments ■ Focus on program correctness instead of speed ■ Forget about full correctness, though ■ Scalability to large programs essential Conclusion