Download Type Systems in Programming Languages - Prof. Jeffrey S. Foster and more Study notes Computer Science in PDF only on Docsity!
Type Systems CMSC 631 – Program Analysis and Understanding Spring 2009 CMSC 631 (^2)
- Consider the (untyped) lambda calculus ! (^) false = "x."y.x ! (^) 0 (Scott) = "x."y.x
- Everything is encoded as a function ! So we can easily misuse combinators - false 0 if 0 then ... etc... ! (^) This is no better than assembly language!
The Need for a Type System
- A type system is some mechanism for distinguishing
good programs from bad
! Good programs = well typed ! Bad programs = ill typed or not typable
- Examples: ! (^) 0 + 1 // well typed ! (^) false 0 // ill-typed: can’t apply a boolean ! 1 + (if true then 0 else false) // ill-typed: can’t add boolean to integer
What is a Type System?
“A type system is a tractable syntactic method for
proving the absence of certain program behaviors
by classifying phrases according to the kinds of
values they compute.”
- Benjamin Pierce, Types and Programming Languages
A Definition of Type Systems
CMSC 631 (^5)
- e ::= n | x | "x:t.e | e e ! (^) Functions include the type of their argument ! We don’t really need this, but it will come in handy
- t ::= int | t # t ! (^) t1 # t2 is a the type of a function that, given an argument of type t1, returns a result of type t - t1 is the domain , and t2 is the range
Simply-Typed Lambda Calculus
CMSC 631 (^6)
- Our type system will prove judgments of the form ! A! e : t ! “In type environment A, expression e has type t”
Type Judgments
- A type environment is a map from variables to
types (a kind of symbol table)
! is the empty type environment
- A closed term^ e^ is^ well-typed^ if^!^ e : t^ for some^ t
- We’ll abbreviate this as! e : t ! A, x:t is just like A, except x now has type t
- The type of x in A, x:t is t
- The type of^ z$x^ in^ A, x:t^ in the type of^ z^ in^ A
- When we see a variable in a program, we look in
the type environment to find its type
Type Environments
Type Rules
A! n : int x dom(A) A! x : A(x) A, x:t! e : t% A! "x:t.e : t#t% A! e1 : t#t% A! e2 : t A! e1 e2 : t!
CMSC 631
Progress
- Suppose! e : t. Then either e is a value, or
there exists e’ such that e # e!
- Proof by induction on e ! Base cases n, "x.e – these are values, so we’re done ! Base case x – can’t happen (empty type environment) ! (^) Inductive case e1 e2 – If e1 is not a value, then by induction we can evaluate it, so we’re done, and similarly for e2. Otherwise both e1 and e2 are values. Inspection of the type rules shows that e1 must have a function type, and therefore must be a lambda since it’s a value. Therefore we can make progress. 13 CMSC 631
Preservation
- If! e : t and e # e! then! e! : t
- Proof by induction on e # e! ! (^) Induction (easier than the base case!). Expression e must have the form e1 e2. ! Assume! e1 e2 : t and e1 e2 # e!. Then we have! e1 : t! # t and! e2 : t!. ! Then there are three cases. - If^ e1^ #^ e1!, then by induction^!^ e1 : t!^ #^ t, so^ e1!^ e2^ has type^ t - If reduction inside e2, similar 14 CMSC 631
Preservation, cont’d
- Otherwise ("x.e) v # e[v\x]. Then we have ! Thus we have - x : t%! e : t -!^ v : t% ! Then by the substitution lemma (not shown) we have -!^ e[v\x] : t ! And so we have preservation 15 x: t%! e : t ! "x.e : t%#t CMSC 631
Substitution Lemma
- If A! v : t and A, x:t! e : t%, then A! e[v\x] : t%
- Proof: Induction on the structure of e
- For lazy semantics, we’d prove ! (^) If A! e1 : t and A, x:t! e : t%, then A! e[e1\x] : t% 16
CMSC 631 (^17)
- So we have ! Progress: Suppose! e : t. Then either e is a value, or there exists e! such that e # e! ! (^) Preservation: If! e : t and e # e! then! e! : t
- Putting these together, we get soundness ! If! e : t then either there exists a value v such that e #* v, or e diverges (doesn’t terminate).
- What does this mean? ! (^) Evaluation getting stuck is bad, so ! “Well-typed programs don’t go wrong”
Soundness
CMSC 631 (^18)
e ::= ... | (e, e) | fst e | snd e
- Or, maybe, just add functions ! (^) pair : t # t% # t! t% ! (^) fst : t! t% # t ! (^) snd : t! t% # t%
Product Types (Tuples)
A! e : t! t! A! fst e : t A! e : t! t! A! snd e : t! A! e1 : t A! e2 : t ! A! (e1,e2) : t! t!
e ::= ... | inLt2 e | inRt1 e
| (case e of x1:t1 # e1| x2:t2 # e2)
Sum Types (Tagged Unions)
A! e : t A! inLt2 e : t1 + t A! e : t A! inRt1 e : t1 + t A! e : t1 + t A, x1:t1! e1 : t A, x2:t2! e2 : t A! (case e of x1:t1 # e1 | x2:t2 # e2) : t
- Self application is not checkable in our system ! (^) It would require a type t such that t = t#t% - (We’ll see this next, but so far...)
- The simply-typed lambda calculus is strongly
normalizing
! (^) Every program has a normal form ! (^) I.e., every program halts!
Self Application and Types
A, x:?! x : t#t! A, x:?! x : t A, x:?A, x:? !! x x : ...x x : ... AA !! ""x:?.x x : ...x:?.x x : ...
CMSC 631 (^25)
ML Datatypes Example
- type list = Int of int | Cons of int * int list ! Equivalent to μ!.int+(int! !)
- (Int 3) equivalent to ! fold (inLint!μ".int+(int!") 3)
- (Cons (2,(Int 3)) equivalent to ! fold (inRint (2, fold (inLint!μ".int+(int!") 3)))
- match e with Int x -> e1 | Cons x -> e2 same as ! (^) case (unfold e) - x:int^ #^ e - | x:^ int!(μ".int+(int!"))^ #^ e CMSC 631 (^26) - In the pure lambda calculus, every term is typable
with recursive types
! (Pure = variables, functions, applications only)
- Most languages have some kind of “recursive” type ! E.g., for data structures like lists, tree, etc.
- However, usually two recursive types that define
the same structure but use a different name are
considered different
! E.g., struct foo { int x; struct foo *next; } is different from struct bar { int x; struct bar *next; }
Discussion
- We’ve discussed simple types so far ! Integers, functions, pairs, unions ! Extensions for recursive types and updatable refs
- Type systems have nice properties ! Type checking is straightforward (needs annotations) ! (^) Well typed programs don’t go “wrong” - They don’t get stuck in the operational semantics
- But...We can’t type check all good programs
Recap
- How can we build more flexible type systems? ! More programs type check ! Type checking is still tractable
- How can reduce the annotation burden? ! (^) Type inference
Up Next: Improving Types
CMSC 631 (^29)
• Observation: "x.x returns its argument exactly
and places no constraints on the type of x
! (^) The identity function works for any argument type
• We can express this with universal quantification:
! "x.x : '.'#' ! (^) For any type ', the identity function has type '#' ! This is also known as parametric polymorphism
Parametric Polymorphism
CMSC 631 (^30)
System F: annotated polymorphism
- Let’s extend our system as follows: ! (^) t ::= ' | int | t # t | '.t ! (^) e ::= n | x | "x.e | e e | #'.e | e [t]
- (^) That is, we add polymorphic types, and we add explicit type abstraction (generalization) … ! Annotated code locations at which a value of polymorphic type is created
- (^) … and type application (instantiation) ! Explicitly annotated code locations at which a value of polymorphic type is used
- (^) This system due to Girard, concurrently Reynolds
• Polymorphic functions map types to terms
!Normal functions map terms to terms
• Examples
!#'."x:'.x :^ '.'#' !#'.#"."x:'."y:".x :^ '.^ ".'#"#' !#'.#"."x:'."y:".y :^ '.^ ".'#"#"
Defining Polymorphic Functions
• When we use a parametric polymorphic type, we
apply (or instantiate) it with a particular type
! (^) In System F this is done by hand: ! (^) (#'."x:'.x)[t1] : t1 # t ! (#'."x:'.x)[t2] : t2 # t
• This is where the term^ parametric^ comes from
! (^) The type '.'#' is a “function” in the domain of types, and it is passed a parameter at instantiation time
Instantiation
CMSC 631 (^37)
- Let’s consider the simply typed lambda calculus
with integers
! (^) e ::= n | x | "x:t.e | e e ! (No parametric polymorphism)
- Type inference : Given a bare term (with no type
annotations), can we reconstruct a valid typing
for it, or show that it has no valid typing?
Type Inference
CMSC 631 (^38)
- Problem: Consider the rule for functions
- Without type annotations, where do we get t? ! We’ll use type variables to stand for as-yet-unknown types - t ::=^ '^ | int | t^ #^ t ! (^) We’ll generate equality constraints t = t among the types and type variables - And then we’ll solve the constraints to compute a typing
Type Language
A, x:t! e : t% A! "x:t.e : t#t%
Type Inference Rules
A! n : int x dom(A) A! x : A(x) A, x:'! e : t% ' fresh A! "x.e : '#t% A! e1 : t 1 A! e2 : t t1 = t2 #( ( fresh A! e1 e2 : (
“Generated” constraint
- We collect all constraints appearing in the
derivation into some set C to be solved
- Here, C contains just '#' = int #( ! (^) Solution: ' = int = (
- Thus this program is typable, and we can derive
a typing by replacing ' and ( by int in the proof
Example
A, x:'! x:' A! ("x.x) : '#' AA !! 3 : int3 : int (^) '#''#' = int= int #(#( AAA !!! ((("""x.x) 3 :x.x) 3 :x.x) 3 : (((
CMSC 631 (^41)
- We can solve the equality constraints using the
following rewrite rules, which reduce a larger set
of constraints to a smaller set
! (^) C {int=int} C ! (^) C {'=t} C[t'] ! (^) C {t='} C[t'] ! (^) C {t1#t2=t1%#t2%} C {t1=t1%} {t2=t2%} ! C {int=t1#t2} unsatisfiable ! C {t1#t2=int} unsatisfiable
Solving Equality Constraints
CMSC 631 (^42)
Termination
- We can prove that the constraint solving
algorithm terminates.
- For each rewriting rule, either ! (^) We reduce the size of the constraint set ! (^) We reduce the number of “arrow” constructors in the constraint set
- As a result, the constraint always gets “smaller”
and eventually becomes empty
! A similar argument is made for strong normalization in the simply-typed lambda calculus
- We don’t have recursive types, so we shouldn’t
infer them
- So in the operation C[t'], require that ' FV(t)
- In practice, it may better to allow ' FV(t) and do
the occurs check at the end
! But that can be awkward to implement
Occurs Check
- Computing C[t'] by substitution is inefficient
- Instead, use a union-find data structure to
represent equal types
! The terms are in a union-find forest ! When a variable and a term are equated, we union them so they have the same ECR (equivalence class representative)
- Want the ECR to be the concrete type with which variables have been unified, if one exists. Can read off solution by reading the ECR of each set.
Unifying a Variable and a Type
CMSC 631 (^49)
Attempting Type Inference
- Let’s extend simply-typed calculus as follows: ! (^) t ::= ' | int | t # t | '.t ! (^) e ::= n | x | "x.e | e e
- (^) Type inference will automatically infer where to generalize a term, to introduce polymorphic types, and where to instantiate them CMSC 631 (^50) - This rule is exacty the same as System F, but we
just “magically” pick which t’ to instantiate with
! (^) You’re surely wondering about algorithmics. We’ll get to that …
Instantiation
A! e : '.t A! e : t[t!']
- Question: When is it safe to generalize
(quantify) a type variable ' in the type of
expression e?
- Answer: Whenever we can redo the typing
proof for e, choosing ' to be anything we want,
and still have a valid typing proof.
Generalization
- The choice of the type of x is purely local to
type checking "x.x
! There is no interaction with the outside environment ! Thus we can generalize the type of x
Examples
A, x:'! e : ' A! "x.x : '#' A, x:int! x : int A! "x.x : int#int A, x:(i#i)! x : (i#i) A! "x.x : (i#i)#(i#i)
CMSC 631 (^53)
- The function restricts the type of x, so we
cannot introduce a type variable
! Thus we cannot generalize the type of x ! We can only generalize when the function doesn’t “look at” its parameter
Examples (cont’d)
A, x:int! x : int A! "x.x+3 : int#int CMSC 631 (^54)
- The choice of the type of x depends on the type
environment
! In the first derivation, x and y have the same type; if we generalize the type of x, they could have different types ! (^) Thus we cannot generalize the type of x
Examples (cont’d)
A, y:', x:'! if p then x else y : ' A, y:'! "x.if p then x else y : '#' A, y:', x:int! if p then x else y : int A, y:'! "x.if p then x else y : int#int
- We can generalize any type variable that is
unconstrained by the environment
! (^) Warning: This won’t quite work with refs
Generalization Rule
A! e : t '"FV(A) A! e : #'.t
- Suppose we have ! (^) A! e : t and ' FV(A)
- Then let u be any type. By induction, can show ! (^) A[u']! e : t[u'] ! (^) But then since ' FV(A), that’s equivalent to ! (^) A! e : t[u']
Another Justification
CMSC 631 (^61)
- A type inference algorithm that explicitly solves
the equality constraints on-line
- Instead of implicit global substitution (like we used
before), threads the substitution through the
inference
- In practice, use previous algorithm, plus generalize
at let and instantiate at variable uses.
! (^) Solve for the type of e1, generalize it, then instantiate its solution when doing inference on e
Algorithm W
CMSC 631 (^62)
- Parametric polymorphic type inference let x = "x.x in // x : '.'#' x 3; // x : (#(, (=int x ("y.y) // x : )#), )=#
- This would be untypable in a monomorphic type
system
Example
- We’ve just seen parametric polymorphism ! System F and Hindley-Milner style polymorphism
- Another popular form is subtype polymorphism ! As in OO programming ! These two can be combined (e.g., Java Generics)
- Some languages also have ad-hoc polymorphism ! (^) E.g., + operator that works on ints and floats ! (^) E.g., overloading in Java
Kinds of Polymorphism
e ::= x | "x.e | e e
| ref e allocation
| !e dereference
| e := e assignment
| e; e sequencing
- Notice that this is not C
- Variables cannot be updated; only references can
- I.e., there are no l-values or r-values
- This is a language with updatable references
An Imperative Language
CMSC 631 (^65)
!(ref 0)
let x = ref 0 in
x := !x + 1
let x = ref 0 in
"y. x := !x + 1; !x
Examples
CMSC 631 (^66)
- t ::= ... | ref t ! Note: in ML this type is written t ref
Type Checking Rules
A! e : t A! ref e : ref t A! e : ref t A! !e : t A! e1 : ref t A! e2 : t A! e1 := e2 : t
- Sometimes in imperative programs we write
expressions that have some side effect but no
interesting result
- To represent this directly, use unit: ! e ::= ... | () ! t ::= ... | unit
Unit and the Unit Type
A! e1 : ref t A! e2 : t A! () : unit A! e1 := e2 : unit
- Now we need to keep track of memory ! State is a map from locations to values ! Our redexes will be tuples ‹State, expression› ! As a consequence, order of evaluation matters
- As before, evaluation will yield a fully-evaluated
term, also called a value
! (^) v ::= x | "x.e ! (^) e ::= v | e e | ref e | !e | e := e
Operational Semantics
CMSC 631 (^73)
- Only allow values to be generalized ! (^) v ::= x | n | "x.e ! e ::= v | e e | ref e | !e | e := e ! Intuition: Values cannot later be updated ! This solution due to Wright and Felleisen - Tofte found a much more complicated solution
Solution: The Value Restriction
A! v : t1 A,x:#'.t! e2 : t2 '=FV(t)-FV(A) A! let x = v in e2 : t CMSC 631 (^74)
- Handles higher-order functions
- Handles data structures smoothly
- Works in infinite domains ! Set of types is unlimited
- No forward/backward distinction
- Polymorphism provides context-sensitivity
Benefits of Type Inference
CMSC 631 (^75)
- Flow-insensitive ! (^) Types are the same at all program points ! (^) May produce coarse results ! Type inference failure can be hard to understand
- Polymorphic type inference may not scale ! Exponential in worst case ! (^) Seems fine in practice (witness ML)
Drawbacks to Type Inference
CMSC 631 (^76)
- Flow-insensitive ! Types are the same at all program points ! May produce coarse results ! Type inference failure can be hard to understand
- Polymorphism may not scale ! (^) Exponential in worst case ! Seems fine in practice (witness ML)
Drawbacks to Type Inference