Bayesian Networks 1, Lecture notes of Probability and Statistics

Bayesian Networks and Inference with Boolean and continuous variables. It covers topics such as probability, Bayesian Inference, and explaining away. The document also explains how to define the belief network and compute the inference. likely to be useful as study notes or lecture notes for university students studying Artificial Intelligence or related topics.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

strawberry3
strawberry3 🇺🇸

4.6

(39)

387 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Artificial Intelligence
15-381
Mar 27, 2007
Bayesian Networks 1
Michael S. Lewicki ! Carnegie Mellon
AI: Bayes Nets 1
Recap of last lecture
Probability: precise representation of uncertainty
Probability theory: optimal updating of knowledge based on new information
Bayesian Inference with Boolean variables
Inferences combines sources of knowledge
Inference is sequential
2
posterior
likelihood prior
normalizing constant
P(D|T) = P(T|D)P(D)
P(T|D)P(D) + P(T|¯
D)P(¯
D)
P(D|T) = 0.9×0.001
0.9×0.001 + 0.1×0.999 = 0.0089
P(D|T1, T2) = P(T2|D)P(T1|D)P(D)
P(T2)P(T1)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Bayesian Networks 1 and more Lecture notes Probability and Statistics in PDF only on Docsity!

Artificial Intelligence

Mar 27, 2007

Bayesian Networks 1

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon

Recap of last lecture

• Probability:^ precise representation of uncertainty

• Probability theory:^ optimal updating of knowledge based on new information

• Bayesian Inference with Boolean variables

• Inferences combines sources of knowledge

• Inference is sequential

posterior likelihood prior normalizing constant

P (D|T ) =

P (T |D)P (D)

P (T |D)P (D) + P (T | D¯)P ( D¯)

P (D|T ) =

0. 9 × 0. 001

0. 9 × 0. 001 + 0. 1 × 0. 999

P (D|T 1 , T 2 ) =

P (T 2 |D)P (T 1 |D)P (D)

P (T 2 )P (T 1 )

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Bayesian inference with continuous variables (recap) 3 p(θ|y, n) = p(y|θ, n)p(θ|n) p(y|n) posterior likelihood prior normalizing constant = ∫ p(y|θ, n)p(θ|n)dθ 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ! p( !^ | y=1, n=5) 1 2 3 4 5 6 y p(y| !=0.05, n=5) 1 2 3 4 5 6 y p(y| !=0.2, n=5) 1 2 3 4 5 6 y p(y| !=0.35, n=5) 1 2 3 4 5 6 y p(y| !=0.5, n=5) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ! p(! | y=0, n=0) prior (uniform) likelihood (Binomial) posterior (beta) p(θ|y, n) ∝ ( n y ) θy^ ( 1 − θ)n−y AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Today: Inference with more complex dependencies

  • How do we represent (model) more complex probabilistic relationships?
  • How do we use these models to draw inferences?

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon

  • How do we represent these relationships? Types of probabilistic relationships 7 Direct cause A B Indirect cause A B C Common cause Common effect A B C A B C P(B|A) P(B|A) P(C|B) C is independent of A given B

P(B|A)

P(C|A)

P(C|A,B)

Are A and B independent? Are B and C independent? AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Belief networks

open door

wife burglar

  • In Belief networks,^ causal relationships^ are represented in directed acyclic graphs.
  • Arrows indicate causal relationships between the nodes. How can we determine what is happening before we go in? We need more information. What else can we observe?

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Explaining away 9

open door

wife burglar

car in garage

  • Suppose we notice that the car is in the garage.
  • Now we infer that it’s probably my wife,^ and not a burglar.
  • This fact^ “explains away” the hypothesis of a burglar. Note that there is no direct causal link between “burglar” and “car in garage”. Yet, seeing the car changes our beliefs about the burglar. AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Explaining away

open door damaged door

wife burglar

car in garage

  • Suppose we notice that the car is in the garage.
  • Now we infer that it’s probably my wife,^ and not a burglar.
  • This fact^ “explains away” the hypothesis of a burglar.
  • We could also notice the door was damaged,^ in which case we reach the opposite conclusion. How do we make this inference process more precise? Let’s start by writing down the conditional probabilities.

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Defining the belief network 13

  • Each link in the graph represents a conditional relationship between nodes.
  • To compute the inference,^ we must specify the conditional probabilities.
  • Let’s start with the open door.^ What do we specify? W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(W)

What else do we need to specify? The priors probabilities.

open door damaged door

wife burglar

car in garage

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Defining the belief network

  • Each link in the graph represents a conditional relationship between nodes.
  • To compute the inference,^ we must specify the conditional probabilities.
  • Let’s start with the open door.^ What do we specify? W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(B)

P(W)

What else do we need to specify? The priors probabilities.

open door damaged door

wife burglar

car in garage

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Defining the belief network 15

  • Each link in the graph represents a conditional relationship between nodes.
  • To compute the inference,^ we must specify the conditional probabilities.
  • Let’s start with the open door.^ What do we specify? W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(B)

P(W)

W P(C|W) F 0. T 0. Finally, we specify the remaining conditionals

open door damaged door

wife burglar

car in garage

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Defining the belief network

  • Each link in the graph represents a conditional relationship between nodes.
  • To compute the inference,^ we must specify the conditional probabilities.
  • Let’s start with the open door.^ What do we specify? W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(B)

P(W)

B P(D|B) F 0. T 0.

open door damaged door

wife burglar

car in garage

W P(C|W) F 0. T 0. Finally, we specify the remaining conditionals Now what?

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon

open door damaged door

wife burglar

car in garage

Calculating probabilities using the joint distribution 19

  • P(o,w,¬b,c,¬d) = P(o|w,¬b)P(c|w)P(¬d|¬b)P(w)P(¬b) = 0.05! 0.95! 0.999! 0.05! 0 .999 = 0.
  • This is essentially the probability that my wife is home and leaves the door open. W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(B)

P(W)

B P(D|B) F 0. T 0. W P(C|W) F 0. T 0. AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Calculating probabilities in a general Bayesian belief network

  • Note that by specifying all the conditional probabilities,^ we have also specified the joint probability. For the directed graph above: P(A,B,C,D,E) = P(A) P(B|C) P(C|A) P(D|C,E) P(E|A,C)
  • The general expression is:

P (x 1 ,... , xn) ≡ P (X 1 = x 1 ∧... ∧ Xn = xn)

∏^ n

i= 1

P (xi|parents(Xi))

" With this we can calculate (in principle) the probability of any joint probability.

C E A B D

" This implies that we can also calculate any conditional probability.

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Calculating conditional probabilities 21

  • Using the joint we can compute any conditional probability too
  • The conditional probability of any one subset of variables given another disjoint subset is where p S is shorthand for all the entries of the joint matching subset S.
  • How many terms are in this sum?

P (S 1 |S 2 ) =

P (S 1 ∧ S 2 )

P (S 2 )

p ∈ S 1 ∧ S 2

p ∈ S 2

2 N

The number of terms in the sums is exponential in the number of variables. In fact, general querying Bayes nets is NP complete. AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon So what do we do?

  • There are also many approximations:
    • stochastic (MCMC) approximations
    • approximations
  • The are special cases of Bayes nets for which there are fast,^ exact algorithms:
    • variable elimination
    • belief propagation

• Could either model causes and

effects

• Or equivalently stochastic

binary features.

• Each input xi^ encodes the

probability that the ith binary

input feature is present.

• The set of features

represented by #j is defined

by weights fij which encode the

probability that feature i is an

instance of #j.

A general one-layer causal network

b c

a

Each column is a distinct eight-dimensional binary feature.

The data: a set of stochastic binary patterns

There are five underlying causal feature patterns. What are they?

b c a Each column is a distinct eight-dimensional binary feature. b c a true hidden causes of the data The data: a set of stochastic binary patterns This is a learning problem, which we’ll cover in later lecture. AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Hierarchical Statistical Models A Bayesian belief network: pa(Si) Si D The joint probability of binary states is P (S|W) =

i P (Si|pa(Si), W) The probability Si depends only on its parents: P (Si|pa(Si), W) = { h(

j Sjwji)^ if^ Si^ =^1 1 − h(

j Sjwji)^ if^ Si^ =^0 The function h specifies how causes are combined, h(u) = 1 − exp(−u), u > 0. Main points:

  • hierarchical structure allows model to form high order representations
  • upper states are priors for lower states
  • weights encode higher order features