Probability, Bayes Nets and Causality - Image Processing and Analysis | CSE 591, Study notes of Computer Science

Material Type: Notes; Class: Introduction to Image Processing and Analysis; Subject: Computer Science and Engineering; University: Arizona State University - Tempe; Term: Fall 2003;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-0wb
koofers-user-0wb 🇺🇸

10 documents

1 / 30

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSE 591 - FALL 03.
Chitta Baral
Department of Computer Science and Engineering
Arizona State University
Tempe, AZ 85287-5406 USA
http://www.public.asu.edu/cbaral/cse571-f99/
October 12, 2003
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e

Partial preview of the text

Download Probability, Bayes Nets and Causality - Image Processing and Analysis | CSE 591 and more Study notes Computer Science in PDF only on Docsity!

CSE 591 - FALL 03.

Chitta Baral

Department of Computer Science and Engineering

Arizona State University

Tempe, AZ 85287-5406 USA

[email protected]

http://www.public.asu.edu/

∼ cbaral/cse571-f99/

October 12, 2003

PROBABILITY, BAYES NETS AND CAUSALITY

Probability, Bayes nets and Causality

means: belief in

A

under the assumption that

B

is known with

absolute certainty.

P

A

B

P

A

A

and

B

are independent.

P

A

B, C

P

A

C

A

and

B

are conditionally independent

given

C

Dawid’s notation: (

A  B | C )

P than that of joint events.Bayesian philosophers see the conditional relationship as more basic

(^) (

A

B

P

A

B

P

B

Probability, Bayes nets and Causality

Bayesian Networks

Goal:

to provide convenient means of expressing substantive assumptions

to facilitate economical representations of joint probability functions

to facilitate efficient inferences from observations

relationshipIdea: Directed acyclic graphs is used to represent causal or temporal

Basic decomposition scheme

P

A

B

P

A

B

P

B

P

x

1 , x

2 , x

3 ) =

P

x 1 ∧ x 2 ∧ x 3

P

x

1 | x

2 , x

3 ) P

(^) ( x

2

x

3 ) =

P

x

1 | x

2 , x

3 ) P

(^) ( x

2 | x

3 ) P

(^) ( x

3 )

Probability, Bayes nets and Causality

Inference with Bayesian Networks

Prediction and abduction

x

  • a set of observations

y

  • a set of variables deemed important for prediction or diagnosis

Need to compute

P

y

| x

).

P

y

| x

) =

p ( y, x

p ( x ) = ∑ s P

y, x, s

∑ y,s

P

y, x, s

An example:

The Network ∗

P

tampering

P

f ire

Directed Edges: (

tampering, alarm

f ire, alarm

f ire, smoke

alarm, leaving

leaving, report

Probability, Bayes nets and Causality

P local probability distributions:

(^) ( alarm

f ire, tampering

P

alarm

f ire,

tampering

P

alarm

f ire, tampering

P

alarm

f ire,

tampering

P

smoke

f ire

P

smoke,

f ire

P

leaving

alarm

P

leaving

alarm

P

report

leaving

P

report

leaving

Different kinds of inferences ∗

Diagnostic inferences:

P

f ire

report

Causal inferences (prediction):

P

leaving

tampering

Intercausal inferences:

P

f ire

alarm, tampering

Mixed inferences:

P

alarm

report, f ire

P An illustration:

(^) ( tampering

report, smoke

P (^) ( tampering,report,smoke

)

P

(^) ( report,smoke

)

Probability, Bayes nets and Causality

× 0. 9 × 0.

×

×

Similarly, we can also compute

f 1 ( alarm

T, tampering

F

f 1 ( alarm

F, tampering

T

(^) ) and

f 1 ( alarm

F, tampering

F

∑We can now write the denominator as: tampering,leaving,alarm

P

tampering

P

leaving

alarm

P

report

T

leaving

f 1 ( alarm, tampering

tampering,leaving

P

tampering

P

report

T

leaving

alarm

P

leaving

alarm

f 1 ( alarm, tampering

Let us denote

∑ alarm

P

leaving

alarm

f 1 ( alarm, tampering

by

f 2 ( leaving, tampering

). We can compute it as we compute

f 1

= The denominator can now be written as:

tampering,leaving

P

tampering

P

report

T

leaving

f 2 ( leaving, tampering

tampering

P

tampering

leaving

P

report

T

leaving

f 2 ( leaving, tampering

Let us denote

leaving

P

report

T

leaving

Probability, Bayes nets and Causality

f 2 ( leaving, tampering

) by

f 3 ( tampering

) and compute it like

the other

f i s.

∑The denominator can now be written as: tampering

P

tampering

f 3 ( tampering

Main Issues and challenges

Computing the conditional probabilities efficiently

Inference in general networks in NP-hard

(say for trees).Many efficient algorithms are defined for particular kind of networks ∗

Algorithm based on message passing architecture for trees.

Join-tree propagation

Cutset conditioning

Hybrid combinations of the above two

Approximation methods: stochastic simulation.

Probability, Bayes nets and Causality

distributions can not.)Causal networks can predict the effect of actions. (Simple joint

Stability and autonomy

the network without changing the others.Autonomy: It is possible to change one parent child relationship in

minimum of extra information.Stability: One can predict the effect of external interventions with

of autonomy, the change is local.merely the immediate changes implied by the intervention. Becausefunction for each of the many possible interventions, we specifyAutonomy and intervention: Instead of specifying a new probability

LetDefinition: Causal Bayesian network

P

v

) be a probability distribution on a set

V

of variables, and let

P

x ( v

) denote the distribution resulting from the intervention

do

X

x

) which sets any subset

X

of variables to constants

x

.

Denote by

P

the set of all interventional distributions

P

x ( v ),

X

V

12

Probability, Bayes nets and Causality

including

P

v

) which represents no intervention. A DAG

G

is said to

be a

causal Bayesian network

compatible with

P

iff the following

three conditions hold for every

P

x

P

P

x ( v

) is Markov relative to

G

P

x ( v i ) = 1, for all

V

i

X

, whenever

v i

is consistent with

X

x

.

P

x ( v i | pa

i ) =

P

v i | pa

i ) for all

V

i

X

, whenever

pa

i

is consistent

with

X

x

.

Properties:

for all

v

consistent with

x

:

P

x ( v

) =

{ i | V i ∈ X

}

P

(^) ( v i | pa

i )

For all

i ,

P

(^) ( v i | pa

i ) =

P

pa

i ( v i )

parents, corresponds to causal effects.)(The above ensures, conditional probabilities with respect to

Probability, Bayes nets and Causality

S

1

remains true regardless of what we learn or know about the

season or the pavement.

Falling barometer predicts rain, does not explain it.

Probability, Bayes nets and Causality

Functional Causal Models

Two views of non-determinism

due to our ignorance of the underlying boundary condition.Nature’s laws are deterministic, and randomness surfaces merelyLaplace’s (1814) conception of natural phenomena:

All relationships are inherently stochastic.Modern (quantum mechanical) conception of physics:

Why Pearl’s book uses Laplace’s conception of causality

sciencesbesides the fact that it is used in genetics, econometrics and social

It is more general. ∗

round;relationships (with stochastic inputs), but not the other wayEvery stochastic model can be emulated by many functional

Probability, Bayes nets and Causality

F

is a set of functions

f 1 ,... , f

n }

giving rise to a set of structural

equations of the form:

x

i

=

f i ( pa

i , u

i ),

i

= 1

,... , n

Types of queries that can be answered using functional causal models

  • Prediction

: Would the pavement be slippery if we

find

the

sprinkler off?

  • Interventions

: Would the pavement be slippery if we

make sure

that the sprinkler is off?

  • Counterfactuals

: Would the pavement be slippery

had

the

and the sprinkler is on?sprinkler been off, given that the pavement is in fact not slippery

Prediction using Markovian causal models:

member ofCausal diagram: A graph obtained by having edges from each

P A

i

to

X

i .

called semi-Markovian.If the causal diagram is acyclic then the corresponding model is

Probability, Bayes nets and Causality

the values of

X

variables will be uniquely determined by the

U

variables.

The joint distribution

P

x

1 ,... , x

n ) is determined uniquely by

the distribution

P

u

) of the error variables.

is calledIf in addition the error terms are mutually independent, the model

Markovian

Theorem (Pearl and Verma): Every Markovian causal model

M

induces a distribution

P

x

1 ,... , x

n ) that satisfies the Markov

condition relative to the causal diagram

G

associated with

M

(^) , that

is each variable

X

i

is independent on all its non-descendants, given

its parents

P A

i

in

G

Theorem (Drudgel and Simon): For every Bayesian network

G

characterized by a distribution

P

(^) , there exists a function model that

generates a distribution identical to

P

over the probabilistic specificationAdvantages of doing prediction using causal-functional specification ∗

When organizing knowledge using Markov causal models reliable