Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Bayesian Networks 1, Lecture notes of Probability and Statistics

Carnegie Mellon University (CMU)Probability and Statistics

Bayesian Networks and Inference with Boolean and continuous variables. It covers topics such as probability, Bayesian Inference, and explaining away. The document also explains how to define the belief network and compute the inference. likely to be useful as study notes or lecture notes for university students studying Artificial Intelligence or related topics.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

strawberry3 🇺🇸

4.6

(39)

387 documents

1 / 15

This page cannot be seen from the preview

Don't miss anything!

Artificial Intelligence

15-381

Mar 27, 2007

Bayesian Networks 1

Michael S. Lewicki ! Carnegie Mellon

AI: Bayes Nets 1

Recap of last lecture

•Probability: precise representation of uncertainty

•Probability theory: optimal updating of knowledge based on new information

•Bayesian Inference with Boolean variables

•Inferences combines sources of knowledge

•Inference is sequential

posterior

likelihood prior

normalizing constant

P(D|T) = P(T|D)P(D)

P(T|D)P(D) + P(T|¯

D)P(¯

P(D|T) = 0.9×0.001

0.9×0.001 + 0.1×0.999 = 0.0089

P(D|T1, T2) = P(T2|D)P(T1|D)P(D)

P(T2)P(T1)

Discover Lecture notes of Probability and Statistics Carnegie Mellon University (CMU)

Partial preview of the text

Download Bayesian Networks 1 and more Lecture notes Probability and Statistics in PDF only on Docsity!

Artificial Intelligence

Mar 27, 2007

Bayesian Networks 1

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon

Recap of last lecture

• Probability:^ precise representation of uncertainty

• Probability theory:^ optimal updating of knowledge based on new information

• Bayesian Inference with Boolean variables

• Inferences combines sources of knowledge

• Inference is sequential

posterior likelihood prior normalizing constant

P (D|T ) =

P (T |D)P (D)

P (T |D)P (D) + P (T | D¯)P ( D¯)

P (D|T ) =

0. 9 × 0. 001

0. 9 × 0. 001 + 0. 1 × 0. 999

P (D|T 1 , T 2 ) =

P (T 2 |D)P (T 1 |D)P (D)

P (T 2 )P (T 1 )

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Bayesian inference with continuous variables (recap) 3 p(θ|y, n) = p(y|θ, n)p(θ|n) p(y|n) posterior likelihood prior normalizing constant = ∫ p(y|θ, n)p(θ|n)dθ 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ! p( !^ | y=1, n=5) 1 2 3 4 5 6 y p(y| !=0.05, n=5) 1 2 3 4 5 6 y p(y| !=0.2, n=5) 1 2 3 4 5 6 y p(y| !=0.35, n=5) 1 2 3 4 5 6 y p(y| !=0.5, n=5) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ! p(! | y=0, n=0) prior (uniform) likelihood (Binomial) posterior (beta) p(θ|y, n) ∝ ( n y ) θy^ ( 1 − θ)n−y AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Today: Inference with more complex dependencies

How do we represent (model) more complex probabilistic relationships?
How do we use these models to draw inferences?

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon

How do we represent these relationships? Types of probabilistic relationships 7 Direct cause A B Indirect cause A B C Common cause Common effect A B C A B C P(B|A) P(B|A) P(C|B) C is independent of A given B

P(B|A)

P(C|A)

P(C|A,B)

Are A and B independent? Are B and C independent? AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Belief networks

open door

wife burglar

In Belief networks,^ causal relationships^ are represented in directed acyclic graphs.
Arrows indicate causal relationships between the nodes. How can we determine what is happening before we go in? We need more information. What else can we observe?

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Explaining away 9

open door

wife burglar

car in garage

Suppose we notice that the car is in the garage.
Now we infer that it’s probably my wife,^ and not a burglar.
This fact^ “explains away” the hypothesis of a burglar. Note that there is no direct causal link between “burglar” and “car in garage”. Yet, seeing the car changes our beliefs about the burglar. AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Explaining away

open door damaged door

wife burglar

car in garage

Suppose we notice that the car is in the garage.
Now we infer that it’s probably my wife,^ and not a burglar.
This fact^ “explains away” the hypothesis of a burglar.
We could also notice the door was damaged,^ in which case we reach the opposite conclusion. How do we make this inference process more precise? Let’s start by writing down the conditional probabilities.

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Defining the belief network 13

Each link in the graph represents a conditional relationship between nodes.
To compute the inference,^ we must specify the conditional probabilities.
Let’s start with the open door.^ What do we specify? W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(W)

What else do we need to specify? The priors probabilities.

open door damaged door

wife burglar

car in garage

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Defining the belief network

Each link in the graph represents a conditional relationship between nodes.
To compute the inference,^ we must specify the conditional probabilities.
Let’s start with the open door.^ What do we specify? W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(B)

P(W)

What else do we need to specify? The priors probabilities.

open door damaged door

wife burglar

car in garage

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Defining the belief network 15

Each link in the graph represents a conditional relationship between nodes.
To compute the inference,^ we must specify the conditional probabilities.
Let’s start with the open door.^ What do we specify? W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(B)

P(W)

W P(C|W) F 0. T 0. Finally, we specify the remaining conditionals

open door damaged door

wife burglar

car in garage

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Defining the belief network

Each link in the graph represents a conditional relationship between nodes.
To compute the inference,^ we must specify the conditional probabilities.
Let’s start with the open door.^ What do we specify? W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(B)

P(W)

B P(D|B) F 0. T 0.

open door damaged door

wife burglar

car in garage

W P(C|W) F 0. T 0. Finally, we specify the remaining conditionals Now what?

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon

open door damaged door

wife burglar

car in garage

Calculating probabilities using the joint distribution 19

P(o,w,¬b,c,¬d) = P(o|w,¬b)P(c|w)P(¬d|¬b)P(w)P(¬b) = 0.05! 0.95! 0.999! 0.05! 0 .999 = 0.
This is essentially the probability that my wife is home and leaves the door open. W B P(O|W,B) F F 0. F T 0. T F 0. T T 0. P(B)

P(W)

B P(D|B) F 0. T 0. W P(C|W) F 0. T 0. AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Calculating probabilities in a general Bayesian belief network

Note that by specifying all the conditional probabilities,^ we have also specified the joint probability. For the directed graph above: P(A,B,C,D,E) = P(A) P(B|C) P(C|A) P(D|C,E) P(E|A,C)
The general expression is:

P (x 1 ,... , xn) ≡ P (X 1 = x 1 ∧... ∧ Xn = xn)

∏^ n

i= 1

P (xi|parents(Xi))

" With this we can calculate (in principle) the probability of any joint probability.

C E A B D

" This implies that we can also calculate any conditional probability.

AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Calculating conditional probabilities 21

Using the joint we can compute any conditional probability too
The conditional probability of any one subset of variables given another disjoint subset is where p S is shorthand for all the entries of the joint matching subset S.
How many terms are in this sum?

P (S 1 |S 2 ) =

P (S 1 ∧ S 2 )

P (S 2 )

p ∈ S 1 ∧ S 2

p ∈ S 2

2 N

The number of terms in the sums is exponential in the number of variables. In fact, general querying Bayes nets is NP complete. AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon So what do we do?

There are also many approximations:
- stochastic (MCMC) approximations
- approximations
The are special cases of Bayes nets for which there are fast,^ exact algorithms:
- variable elimination
- belief propagation

• Could either model causes and

effects

• Or equivalently stochastic

binary features.

• Each input xi^ encodes the

probability that the ith binary

input feature is present.

• The set of features

represented by #j is defined

by weights fij which encode the

probability that feature i is an

instance of #j.

A general one-layer causal network

b c

a

Each column is a distinct eight-dimensional binary feature.

The data: a set of stochastic binary patterns

There are five underlying causal feature patterns. What are they?

b c a Each column is a distinct eight-dimensional binary feature. b c a true hidden causes of the data The data: a set of stochastic binary patterns This is a learning problem, which we’ll cover in later lecture. AI: Bayes Nets 1 Michael S. Lewicki! Carnegie Mellon Hierarchical Statistical Models A Bayesian belief network: pa(Si) Si D The joint probability of binary states is P (S|W) =

i P (Si|pa(Si), W) The probability Si depends only on its parents: P (Si|pa(Si), W) = { h(

j Sjwji)^ if^ Si^ =^1 1 − h(

j Sjwji)^ if^ Si^ =^0 The function h specifies how causes are combined, h(u) = 1 − exp(−u), u > 0. Main points:

hierarchical structure allows model to form high order representations
upper states are priors for lower states
weights encode higher order features

Bayesian Networks 1, Lecture notes of Probability and Statistics

Related documents

Partial preview of the text

Download Bayesian Networks 1 and more Lecture notes Probability and Statistics in PDF only on Docsity!

Artificial Intelligence

Mar 27, 2007

Bayesian Networks 1

Recap of last lecture

• Probability:^ precise representation of uncertainty

• Probability theory:^ optimal updating of knowledge based on new information

• Bayesian Inference with Boolean variables

• Inferences combines sources of knowledge

• Inference is sequential

P (D|T ) =

P (T |D)P (D)

P (T |D)P (D) + P (T | D¯)P ( D¯)

P (D|T ) =

0. 9 × 0. 001

0. 9 × 0. 001 + 0. 1 × 0. 999

P (D|T 1 , T 2 ) =

P (T 2 |D)P (T 1 |D)P (D)

P (T 2 )P (T 1 )

P(B|A)

P(C|A)

P(C|A,B)

open door

wife burglar

open door

wife burglar

car in garage

open door damaged door

wife burglar

car in garage

open door damaged door

wife burglar

car in garage

open door damaged door

wife burglar

car in garage

open door damaged door

wife burglar

car in garage

open door damaged door

wife burglar

car in garage

open door damaged door

wife burglar

car in garage

P (x 1 ,... , xn) ≡ P (X 1 = x 1 ∧... ∧ Xn = xn)

∏^ n

P (xi|parents(Xi))

" With this we can calculate (in principle) the probability of any joint probability.

" This implies that we can also calculate any conditional probability.

P (S 1 |S 2 ) =

P (S 1 ∧ S 2 )

P (S 2 )

p ∈ S 1 ∧ S 2

p ∈ S 2

2 N

• Could either model causes and

effects

• Or equivalently stochastic

binary features.

• Each input xi^ encodes the

probability that the ith binary

input feature is present.

• The set of features

represented by #j is defined

by weights fij which encode the

probability that feature i is an

instance of #j.

A general one-layer causal network

b c

a

The data: a set of stochastic binary patterns