Download Bayesian Networks 1 and more Lecture notes Probability and Statistics in PDF only on Docsity!
Artificial Intelligence
Mar 27, 2007
Bayesian Networks 1
AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon
Recap of last lecture
• Probability:^ precise representation of uncertainty
• Probability theory:^ optimal updating of knowledge based on new information
• Bayesian Inference with Boolean variables
• Inferences combines sources of knowledge
• Inference is sequential
posterior likelihood prior normalizing constant
P (D|T ) =
P (T |D)P (D)
P (T |D)P (D) + P (T | D¯)P ( D¯)
P (D|T ) =
0. 9 × 0. 001
0. 9 × 0. 001 + 0. 1 × 0. 999
P (D|T 1 , T 2 ) =
P (T 2 |D)P (T 1 |D)P (D)
P (T 2 )P (T 1 )
AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Bayesian inference with continuous variables (recap) 3 p(θ|y, n) = p(y|θ, n)p(θ|n) p(y|n) posterior likelihood prior normalizing constant = ∫ p(y|θ, n)p(θ|n)dθ 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ! p( !^ | y=1, n=5) 1 2 3 4 5 6 y p(y| !=0.05, n=5) 1 2 3 4 5 6 y p(y| !=0.2, n=5) 1 2 3 4 5 6 y p(y| !=0.35, n=5) 1 2 3 4 5 6 y p(y| !=0.5, n=5) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ! p(! | y=0, n=0) prior (uniform) likelihood (Binomial) posterior (beta) p(θ|y, n) ∝ ( n y ) θy^ ( 1 − θ)n−y AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Today: Inference with more complex dependencies
- How do we represent (model) more complex probabilistic relationships?
- How do we use these models to draw inferences?
AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon
- How do we represent these relationships? Types of probabilistic relationships 7 Direct cause A B Indirect cause A B C Common cause Common effect A B C A B C P(B|A) P(B|A) P(C|B) C is independent of A given B
P(B|A)
P(C|A)
P(C|A,B)
Are A and B independent? Are B and C independent? AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Belief networks
open door
wife burglar
- In Belief networks,^ causal relationships^ are represented in directed acyclic graphs.
- Arrows indicate causal relationships between the nodes. How can we determine what is happening before we go in? We need more information. What else can we observe?
AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Explaining away 9
open door
wife burglar
car in garage
- Suppose we notice that the car is in the garage.
- Now we infer that it’s probably my wife,^ and not a burglar.
- This fact^ “explains away” the hypothesis of a burglar. Note that there is no direct causal link between “burglar” and “car in garage”. Yet, seeing the car changes our beliefs about the burglar. AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Explaining away
open door damaged door
wife burglar
car in garage
- Suppose we notice that the car is in the garage.
- Now we infer that it’s probably my wife,^ and not a burglar.
- This fact^ “explains away” the hypothesis of a burglar.
- We could also notice the door was damaged,^ in which case we reach the opposite conclusion. How do we make this inference process more precise? Let’s start by writing down the conditional probabilities.
AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Defining the belief network 13
- Each link in the graph represents a conditional relationship between nodes.
- To compute the inference,^ we must specify the conditional probabilities.
- Let’s start with the open door.^ What do we specify? W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(W)
What else do we need to specify? The priors probabilities.
open door damaged door
wife burglar
car in garage
AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Defining the belief network
- Each link in the graph represents a conditional relationship between nodes.
- To compute the inference,^ we must specify the conditional probabilities.
- Let’s start with the open door.^ What do we specify? W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(B)
P(W)
What else do we need to specify? The priors probabilities.
open door damaged door
wife burglar
car in garage
AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Defining the belief network 15
- Each link in the graph represents a conditional relationship between nodes.
- To compute the inference,^ we must specify the conditional probabilities.
- Let’s start with the open door.^ What do we specify? W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(B)
P(W)
W P(C|W) F 0. T 0. Finally, we specify the remaining conditionals
open door damaged door
wife burglar
car in garage
AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Defining the belief network
- Each link in the graph represents a conditional relationship between nodes.
- To compute the inference,^ we must specify the conditional probabilities.
- Let’s start with the open door.^ What do we specify? W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(B)
P(W)
B P(D|B) F 0. T 0.
open door damaged door
wife burglar
car in garage
W P(C|W) F 0. T 0. Finally, we specify the remaining conditionals Now what?
AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon
open door damaged door
wife burglar
car in garage
Calculating probabilities using the joint distribution 19
- P(o,w,¬b,c,¬d) = P(o|w,¬b)P(c|w)P(¬d|¬b)P(w)P(¬b) = 0.05! 0.95! 0.999! 0.05! 0 .999 = 0.
- This is essentially the probability that my wife is home and leaves the door open. W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(B)
P(W)
B P(D|B) F 0. T 0. W P(C|W) F 0. T 0. AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Calculating probabilities in a general Bayesian belief network
- Note that by specifying all the conditional probabilities,^ we have also specified the joint probability. For the directed graph above: P(A,B,C,D,E) = P(A) P(B|C) P(C|A) P(D|C,E) P(E|A,C)
- The general expression is:
P (x 1 ,... , xn) ≡ P (X 1 = x 1 ∧... ∧ Xn = xn)
∏^ n
i= 1
P (xi|parents(Xi))
" With this we can calculate (in principle) the probability of any joint probability.
C E A B D
" This implies that we can also calculate any conditional probability.
AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Calculating conditional probabilities 21
- Using the joint we can compute any conditional probability too
- The conditional probability of any one subset of variables given another disjoint subset is where p S is shorthand for all the entries of the joint matching subset S.
- How many terms are in this sum?
P (S 1 |S 2 ) =
P (S 1 ∧ S 2 )
P (S 2 )
p ∈ S 1 ∧ S 2
p ∈ S 2
2 N
The number of terms in the sums is exponential in the number of variables. In fact, general querying Bayes nets is NP complete. AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon So what do we do?
- There are also many approximations:
- stochastic (MCMC) approximations
- approximations
- The are special cases of Bayes nets for which there are fast,^ exact algorithms:
- variable elimination
- belief propagation
• Could either model causes and
effects
• Or equivalently stochastic
binary features.
• Each input xi^ encodes the
probability that the ith binary
input feature is present.
• The set of features
represented by #j is defined
by weights fij which encode the
probability that feature i is an
instance of #j.
A general one-layer causal network
b c
a
Each column is a distinct eight-dimensional binary feature.
The data: a set of stochastic binary patterns
There are five underlying causal feature patterns. What are they?
b c a Each column is a distinct eight-dimensional binary feature. b c a true hidden causes of the data The data: a set of stochastic binary patterns This is a learning problem, which we’ll cover in later lecture. AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Hierarchical Statistical Models A Bayesian belief network: pa(Si) Si D The joint probability of binary states is P (S|W) =
i P (Si|pa(Si), W) The probability Si depends only on its parents: P (Si|pa(Si), W) = { h(
j Sjwji)^ if^ Si^ =^1 1 − h(
j Sjwji)^ if^ Si^ =^0 The function h specifies how causes are combined, h(u) = 1 − exp(−u), u > 0. Main points:
- hierarchical structure allows model to form high order representations
- upper states are priors for lower states
- weights encode higher order features