Unification-Based Alias Analysis: Adding a Level of Indirection and Weak Updates - Prof. J, Study notes of Computer Science

Unification-based alias analysis, a method used to determine if two memory locations may alias. The approach involves adding a level of indirection and using weak updates to improve precision and efficiency. The document also covers the use of restrict annotations and the benefits of const inference.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-75o
koofers-user-75o 🇺🇸

10 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Type Qualifiers:
Lightweight Specifications to Improve Software Quality
CMSC 631, Fall 2003
CMSC 631, Fall 2003 2
Software Quality Today
Even after large, extensive testing efforts, commercial
software is shipped riddled with errors ("bugs").
-- PITAC Report to the President, February 24, 1999
-- Bill Gates, January 15, 2002
(highest priority for Microsoft)
Trustworthy Computing is computing that is available,
reliable, and secure as electricity, water services and
telephony....No Trustworthy Computing platform exists
today.
CMSC 631, Fall 2003 3
Conclusion?
Software is buggy
CMSC 631, Fall 2003 4
So What?
Software has always been buggy
But now...
More people use software
Computers keep getting faster
Speed/quality tradeoff changing
Cost of fixing bugs is high
CMSC 631, Fall 2003 5
Common Techniques for Software Quality
Testing
Code auditing
Drawbacks: Expensive, difficult, error-prone,
limited assurances
What more can we do?
Tools that analyze source code
Techniques for avoiding programming mistakes
CMSC 631, Fall 2003 6
Tools Need Specifications
put_tty_queue_nolock(c, tty);
spin_lock_irqsave(&tty->read_lock, flags);
spin_unlock_irqrestore(&tty->read_lock, flags);
Goal: Add specifications to programs
In a way that...
Programmers will accept
Lightweight
Scales to large programs
Solves many different problems
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Unification-Based Alias Analysis: Adding a Level of Indirection and Weak Updates - Prof. J and more Study notes Computer Science in PDF only on Docsity!

Type Qualifiers:

Lightweight Specifications to Improve Software Quality

CMSC 631, Fall 2003

CMSC 631, Fall 2003 2

Software Quality Today

Even after large, extensive testing efforts, commercial

software is shipped riddled with errors ("bugs").

-- PITAC Report to the President, February 24, 1999

-- Bill Gates, January 15, 2002

(highest priority for Microsoft)

Trustworthy Computing is computing that is available,

reliable, and secure as electricity, water services and

telephony....No Trustworthy Computing platform exists

today.

CMSC 631, Fall 2003 3

Conclusion?

Software is buggy

CMSC 631, Fall 2003 4

So What?

• Software has always been buggy

• But now...

  • More people use software
  • Computers keep getting faster
    • Speed/quality tradeoff changing
  • Cost of fixing bugs is high

Common Techniques for Software Quality

• Testing

• Code auditing

• Drawbacks: Expensive, difficult, error-prone,

limited assurances

• What more can we do?

  • Tools that analyze source code
  • Techniques for avoiding programming mistakes

Tools Need Specifications

put_tty_queue_nolock(c, tty);

spin_lock_irqsave(&tty->read_lock, flags);

spin_unlock_irqrestore(&tty->read_lock, flags);

• Goal: Add specifications to programs

In a way that...

  • Programmers will accept
    • Lightweight
  • Scales to large programs
  • Solves many different problems

CMSC 631, Fall 2003 7

Type Qualifiers

• Extend standard type systems (C, Java, ML)

  • Programmers already use types
  • Programmers understand types
  • Get programmers to write down a little more...

const int ANSI C

ptr(tainted char) Security vulnerabilities

int Æ ptr(open FILE)

File operations

CMSC 631, Fall 2003 8

Application: Format String Vulnerabilities

• I/O functions in C use format strings

printf("Hello!"); Hello!

printf("Hello, %s!", name); Hello, name!

• Instead of

printf("%s", name);

Why not

printf(name);?

CMSC 631, Fall 2003 9

Format String Attacks

• Adversary-controlled format specifier

name := <data-from-network>

printf(name); /* Oops */

  • Attacker sets name = “%s%s%s” to crash program
  • Attacker sets name = “...%n...” to write to memory

• Lots of these bugs in the wild

  • New ones weekly on bugtraq mailing list
  • Too restrictive to forbid variable format strings

CMSC 631, Fall 2003 10

Using Tainted and Untainted

• Add qualifier annotations

int printf(untainted char *fmt, ...)

tainted char *getenv(const char *)

tainted = may be controlled by adversary

untainted = must not be controlled by adversary

CMSC 631, Fall 2003 11

Subtyping

void f(tainted int);

untainted int a;

f(a);

void g(untainted int);

tainted int b;

f(b);

OK

f accepts tainted or

untainted data

Error

g accepts only untainted

data

untainted £ tainted tainted £/ untainted

untainted < tainted

Demo of cqual

http://www.cs.umd.edu/~jfoster

CMSC 631, Fall 2003 19

Satisfiability via Graph Reachability

a

0

a

1

a

2

a

3

a

4

a

5

a

6

a

6

£ a

1

a

2

£ a

4

a

3

= a

5

a

8

untainted

tainted

a

7

a

Is there an inconsistent path through the graph?

CMSC 631, Fall 2003 20

Satisfiability via Graph Reachability

a

0

a

1

a

2

a

3

a

4

a

5

a

6

a

6

£ a

1

a

2

£ a

4

a

3

= a

5

a

8

untainted

tainted

a

7

a

tainted £ a

6

£ a

1

£ a

3

£ a

5

£ a

7

£ untainted

CMSC 631, Fall 2003 21

Satisfiability in Linear Time

• Initial program of size n

  • Fixed set of qualifiers tainted, untainted, ...

• Constraint generation yields O(n) constraints

  • Recursive abstract syntax tree walk

• Graph reachability takes O(n) time

  • Works for semi-lattices, discrete p.o., products

CMSC 631, Fall 2003 22

The Story So Far...

• Type qualifiers as subtyping system

  • Qualifiers live on the standard types
  • Programs Æ constraints Æ graphs

• Useful for a number of real-world problems

• Up next: State change and type qualifiers

  • A glimpse of a more complex system

• Followed by: Applications, experiments

Application: Locking

Lock x;

lock(x);

...critical section...

unlock(x);

x : locked Lock

x : unlocked Lock

Flow-Sensitive Type Qualifiers

• Standard type systems are flow-insensitive

  • Types don't change during execution

/* x : int / x := ...; / x : int */

• We need flow-sensitivity

  • Qualifiers may change during execution

/* y : locked Lock / y := ...; / y : unlocked Lock */

CMSC 631, Fall 2003 25

Some Challenges

• How do we deal with aliasing?

p = &x; *p = ...;

• How do we make the analysis scale?

  • Too expensive to model full state at each point

• What happens when too much is aliased?

  • How does the programmer control aliasing?

CMSC 631, Fall 2003 26

Modeling State with Abstract Stores

• Track each variable's type at each point

  • Abstract stores map variables to types
  • ...and types contain qualifiers

{ x : t, y : r, z : s, ... }

x := ...;

{ x : t', y : r, z : s, ... }

y := ...;

{ x : t', y : r', z : s, ... }

CMSC 631, Fall 2003 27

What About Aliasing?

• Suppose p points to x:

{ x : q int, p : ptr(q int), ... }

*p := ...;

{ x : q int, p : ptr(q' int), ... }

  • Variable names alone are insufficient

• Solution: Add a level of indirection

  • Stores map locations to types
  • Pointer types point to locations

CMSC 631, Fall 2003 28

Unification-Based Alias Analysis

• Initial flow-insensitive pass computes aliasing

  • Before flow-sensitive analysis
  • Simultaneous with standard type inference
    • Types are not flow-sensitive, only qualifiers

• Associate a location r with each pointer

  • Unify locations that may alias

*p : ptr

r

(int) *x : ptr

s

(int)

p = &x;

r

/* require r = s */

Using Locations in Stores

• Suppose p points to x:

*p : ptr

r

(int) x : ptr

r

(int)

*p := ...;

{ r : q' int, h : ptr(r), ... }

{ x : q int,{ r : q int, hp : ptr(q int), ... } : ptr(r), ... }

What About Scalability?

• Stores are too big

{ r : t, h : r, n : s, ... }

  • A program of size n may have
    • n locations
    • n program points
    • fi n

2

space to represent stores

• We need a more compact representation

  • Idea: represent differences between stores

CMSC 631, Fall 2003 37

Constraint Resolution Example

e

0

Alloc r : unlocked Lock

e

1

Assign r : locked Lock

check e

1

(r) :

unlocked Lock

Assign h : q t

Assign r : unlocked Lock

check

t

(r) :

locked Lock

t

e

2

green = { r : unlocked Lock }

red = { r : locked Lock }

CMSC 631, Fall 2003 38

Constraint Resolution Example

e

0

Alloc r : unlocked Lock

e

1

Assign r : locked Lock

check e

1

(r) :

unlocked Lock

Assign h : q t

Assign r : unlocked Lock

check

t

(r) :

locked Lock

t

e

2

green = { r : unlocked Lock }

red = { r : locked Lock }

CMSC 631, Fall 2003 39

Constraint Resolution Example

e

0

Alloc r : unlocked Lock

e

1

Assign r : locked Lock

check e

1

(r) :

unlocked Lock

Assign h : q t

Assign r : unlocked Lock

check

t

(r) :

locked Lock

t

e

2

green = { r : unlocked Lock }

red = { r : locked Lock }

CMSC 631, Fall 2003 40

Constraint Resolution Example

e

0

Alloc r : unlocked Lock

e

1

Assign r : locked Lock

check e

1

(r) :

unlocked Lock

Assign h : q t

Assign r : unlocked Lock

check

t

(r) :

locked Lock

t

e

2

green = { r : unlocked Lock }

red = { r : locked Lock }

Constraint Resolution Example

e

0

Alloc r : unlocked Lock

e

1

Assign r : locked Lock

check e

1

(r) :

unlocked Lock

Assign h : q t

Assign r : unlocked Lock

check

t

(r) :

locked Lock

t

e

2

green = { r : unlocked Lock }

red = { r : locked Lock }

Constraint Resolution Example

e

0

Alloc r : unlocked Lock

e

1

Assign r : locked Lock

check e

1

(r) :

unlocked Lock

Assign h : q t

Assign r : unlocked Lock

check

t

(r) :

locked Lock

t

e

2

green = { r : unlocked Lock }

red = { r : locked Lock }

CMSC 631, Fall 2003 43

Constraint Resolution Example

e

0

Alloc r : unlocked Lock

e

1

Assign r : locked Lock

check e

1

(r) :

unlocked Lock

Assign h : q t

Assign r : unlocked Lock

check

t

(r) :

locked Lock

t

e

2

green = { r : unlocked Lock }

red = { r : locked Lock }

CMSC 631, Fall 2003 44

Constraint Resolution Example

e

0

Alloc r : unlocked Lock

e

1

Assign r : locked Lock

check e

1

(r) :

unlocked Lock

Assign h : q t

Assign r : unlocked Lock

check

t

(r) :

locked Lock

t

e

2

green = { r : unlocked Lock }

red = { r : locked Lock }

CMSC 631, Fall 2003 45

Constraint Resolution Example

e

0

Alloc r : unlocked Lock

e

1

Assign r : locked Lock

check e

1

(r) :

unlocked Lock

Assign h : q t

Assign r : unlocked Lock

check

t

(r) :

locked Lock

t

e

2

green = { r : unlocked Lock }

red = { r : locked Lock }

CMSC 631, Fall 2003 46

Constraint Resolution Example

e

0

Alloc r : unlocked Lock

e

1

Assign r : locked Lock

check e

1

(r) :

unlocked Lock

Assign h : q t

Assign r : unlocked Lock

check

t

(r) :

locked Lock

t

e

2

green = { r : unlocked Lock }

red = { r : locked Lock }

Strong Updates

  • In e

2

, location r has

qualifier q'

  • We've replaced r's qualifier
  • This is called a strong update
  • Location r is linear

e

0

Alloc r : q int

Assign r : q' int

e

2

e

1

Weak Updates

  • What if r allocated twice?
    • Only one is actually updated

e

0

Alloc r : q int

Assign r : q' int

e

2

e

1

Alloc r : q int

  • In e

2

, location r has

qualifier q + q'

  • We've merged r's new and

old qualifiers

  • This is called a weak update
  • Location r is non-linear

CMSC 631, Fall 2003 55

Weak Update Example

void foo(int i) {

do_with_lock(locks[i]);

void do_with_lock(lock *restrict l) {

spin_lock(l);

work();

spin_unlock(l);

CMSC 631, Fall 2003 56

Weak Update Example

void foo(int i) {

do_with_lock(locks[i]);

void do_with_lock(lock *restrict l) {

spin_lock(l);

work();

spin_unlock(l);

r unlocked

r

U

CMSC 631, Fall 2003 57

Weak Update Example

void foo(int i) {

do_with_lock(locks[i]);

void do_with_lock(lock *restrict l) {

spin_lock(l);

work();

spin_unlock(l);

r unlocked

copy r to h

r

U

h

U

CMSC 631, Fall 2003 58

Weak Update Example

void foo(int i) {

do_with_lock(locks[i]);

void do_with_lock(lock *restrict l) {

spin_lock(l);

work();

spin_unlock(l);

r unlocked

copy r to h

r

U

h

U

Weak Update Example

void foo(int i) {

do_with_lock(locks[i]);

void do_with_lock(lock *restrict l) {

spin_lock(l);

work();

spin_unlock(l);

r unlocked

copy r to h

h locked

strong update

r

U

h

L

• h represents only one location

  • Safe to perform strong update (replacement)

Weak Update Example

void foo(int i) {

do_with_lock(locks[i]);

void do_with_lock(lock *restrict l) {

spin_lock(l);

work();

spin_unlock(l);

r unlocked

h unlocked

copy r to h

h locked

r

U

h

U

• h represents only one location

  • Safe to perform strong update (replacement)

CMSC 631, Fall 2003 61

Weak Update Example

void foo(int i) {

do_with_lock(locks[i]);

void do_with_lock(lock *restrict l) {

spin_lock(l);

work();

spin_unlock(l);

r unlocked

h unlocked

copy r to h

copy h to r

h locked

r

U

h

U

• h represents only one location

  • Safe to perform strong update (replacement)

CMSC 631, Fall 2003 62

Weak Update Example

void foo(int i) {

do_with_lock(locks[i]);

void do_with_lock(lock *restrict l) {

spin_lock(l);

work();

spin_unlock(l);

r unlocked

h unlocked

copy r to h

copy h to r

h locked

r

U

h

U

• h represents only one location

  • Safe to perform strong update (replacement)

CMSC 631, Fall 2003 63

Weak Update Example

void foo(int i) {

do_with_lock(locks[i]);

void do_with_lock(lock *restrict l) {

spin_lock(l);

work();

spin_unlock(l);

r unlocked

h unlocked

copy r to h

copy h to r

h locked

r

U

h

U

r unlocked

• h represents only one location

  • Safe to perform strong update (replacement)

CMSC 631, Fall 2003 64

Check Restrict with Type and Effect System

• Types extended with abstract locations

  • Flow-insensitive, unification-based may-alias analysis

t ::= ...| ref

r

(t) pointer to abstract loc r

• Effects are sets of locations

L ::= Ø | {r} | L1 » L2 | L1 « L

Type Rules

  • A | e : t; L
    • In environment A, expression e has type t
    • evaluating e has effect L

A | e : ref

r

(t); L

A | *e : t; L » {r}

Restrict

• restrict x = e1 in e

  • x is a pointer initialized to e
  • x is in scope only within e
  • within e2, only x and copies derived from x can be

used to access *x

  • outside of e2, values derived from x cannot be used

CMSC 631, Fall 2003 73

More Features

• Low-cost polymorphism

  • Use effects to avoid merging stores at fn calls

• Some path-sensitivity

  • Different types on if-then-else branches

Demo

CMSC 631, Fall 2003 75

Qualifier Inference Architecture

restrict annotations

Flow-insensitive

Type qualifiers

Alias Analysis

Effect inference

Flow-sensitive

Linearity inference

Type qualifiers

CMSC 631, Fall 2003 76

Experiment: Const Inference

• In C, l-values can be declared const

  • const int x; // x = ... illegal
  • const int *p; // p = ... legal, *p = ... illegal

• Cqual can automatically infer const

  • Left-hand side of assignment must not be const
  • Everything else may be const

Results: Const Inference

0%

20%

40%

60%

80%

100%

woman-3.0a

patch-2.

m4-1.

diffutils-2.

ssh-1.2.

uucp-1.

Neither

Poly

Mono

Declared

Results: Format String Vulnerabilities

• Analyzed 10 popular unix daemon programs

• Annotations shared across applications

  • One annotated header file for standard libraries

• Found several known vulnerabilities

  • Including ones we didn’t know about

• User interface critical

CMSC 631, Fall 2003 79

Results: Locking

• Looked for simple deadlocks in Linux 2.4.

  • Double acquires/releases

• Analyzed 892 files in linux/drivers individually

• Analyzed 589 modules (all linked files)

  • 14 type errors fi deadlocks
  • ~41/892 fail to typecheck but appear correct
  • ~237/589 fail to typecheck
    • added restrict by hand to remove type errors due to

aliasing for 64

CMSC 631, Fall 2003 80

Eliminating Weak Updates

• How many more strong updates with confine

inference?

• Metric: # lock updates involved in error

  • Lower bound: Assume all updates are strong
    • But unsound!

CMSC 631, Fall 2003 81

Type Errors

No type errors No type errors

because of weak updates

Potential

improvement

589 total

modules

CMSC 631, Fall 2003 82

Actual Improvement in Strong Updates

• 138/152: Same as assuming updates strong

  • Optimal result

• 14/152: Confine misses some strong updates

  • See paper for numbers

Experimental Summary

• Overall, confine inference gets 95% of cases

  • Could eliminate 3277 type errors
  • Does eliminate 3116 type errors
  • (Includes duplicates from duplicated modules)

• Remaining type errors

  • Deadlocks (found 4 new ones)
  • Aliasing conservatism and lack of path sensitivity

Running Time: Locking

0

10

20

30

40

50

60

70

80

90

0K 100K 200K 300K 400K 500K 600K 700K 800K

Lines of code (preprocessed)

Running time (s)

Flow-Sensitive Flow-Insensitive Parsing