Understanding Conditional Probability and Bayesian Networks in Probabilistic Inference - P, Study notes of Computer Science

This document from the university of san francisco's department of computer science explores the concepts of probabilistic inference and the monty hall problem. It covers the basics of probability, conditional probability, and the monty hall problem, providing examples and formulas to help understand these concepts. The document also touches upon the use of bayesian networks for probabilistic inference.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-okv
koofers-user-okv 🇺🇸

8 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Artificial Intelligence
Programming
Probabilistic Inference
Chris Brooks
Department of Computer Science
University of San Francisco
Probability Review
Probability allows us to represent a belief about a
statement, or a likelihood that a statement is true.
P(rain) = 0.6means that we believe it is 60% likely
that it is currently raining.
Axioms:
0P(a)1
The probability of (AB)is P(A) + P(B)P(AB)
Tautologies have P= 1
Contradictions have P= 0
Departmentof Computer Science University of San Francisco p.1/??
Conditional Probability
Once we begin to make observations about the value of
certain variables, our belief in other variables changes.
Once we notice that it’s cloudy, P(Rain)goes up.
this is called conditional probability
Written as: P(Rain|C loudy)
P(a|b) = P(ab)
P(b)
or P(ab) = P(a|b)P(b)
This is called the product rule.
Departmentof Computer Science University of San Francisco
Conditional Probability
Example: P(Cloudy) = 0.3
P(Rain) = 0.2
P(cloudy rain) = 0.15
P(cloudy ¬Rain) = 0.1
P(¬cloudy Rain) = 0.1
P(¬Cloudy ¬Rain) = 0.65
Initially, P(Rain) = 0.2. Once we see that it’s cloudy,
P(Rain|Cloudy) = P(RainCloudy)
P(Cloudy)=0.15
0.3= 0.5
Departmentof Computer Science University of San Francisco p.3/??
Combinations of events
The probability of (AB)is P(A|B)P(B)
What if Aand Bare independent?
Then P(A|B)is P(A), and P(AB)is P(A)P(B).
Example:
What is the probability of “heads” five times in a row?
What is the probability of at least one “head”?
Departmentof Computer Science University of San Francisco p.4/??
Bayes’ Rule
Often, we want to know how a probability changes as a
result of an observation.
Recall the Product Rule:
P(ab) = P(a|b)P(b)
P(ab) = P(b|a)P(a)
We can set these equal to each other
P(a|b)P(b) = P(b|a)P(a)
And then divide by P(a)
P(b|a) = P(a|b)P(b)
P(a)
This equality is known as Bayes’ theorem (or rule or
law).
Departmentof Computer Science University of San Francisco
pf3
pf4
pf5

Partial preview of the text

Download Understanding Conditional Probability and Bayesian Networks in Probabilistic Inference - P and more Study notes Computer Science in PDF only on Docsity!

Artificial IntelligenceProgramming^ Probabilistic Inference^ Chris BrooksDepartment of Computer ScienceUniversity of San Francisco

Probability Review Probability allows us to represent a belief about astatement, or a likelihood that a statement is true.^ P^ (rain) = 0.^6 means that we believe it is 60% likelythat it is currently raining. Axioms:^0 ≤^ P^ (a)^ ≤^1 The probability of^

(A^ ∨^ B)^ is^ P^ (A) +^ P^ (

B)^ −^ P^ (A^ ∧^ B)

Tautologies have^ P^ = 1 Contradictions have^

P^ = 0^ Department of Computer Science — University of San Francisco – p.1/

??

Conditional Probability Once we begin to make observations about the value ofcertain variables, our belief in other variables changes.^ Once we notice that it’s cloudy,

P^ (Rain)^ goes up. this is called^ conditional probability Written as:^ P^ (Rain|Cloudy

P^ (a∧b) P (a|b) = P^ (b) or P (a ∧ b) =^ P^ (a|b)P (b) This is called the product rule.^ Department of Computer Science — University of San Fra

Conditional Probability Example:^ P^ (Cloudy) = 0

.^3

P^ (Rain) = 0.^2 P^ (cloudy^ ∧^ rain) = 0.

P^ (cloudy^ ∧ ¬Rain) = 0

.^1

P^ (¬cloudy^ ∧^ Rain) = 0

.^1

P^ (¬Cloudy^ ∧ ¬Rain) = 0

.^65

Initially,^ P^ (Rain) = 0

.^2. Once we see that it’s cloudy, P^ (Rain|Cloudy) =^ P

(Rain∧Cloudy)^0.^15 =^ = 0^ P^ (Cloudy)^0.^3

.^5 Department of Computer Science — University of San Francisco – p.3/ ??

Combinations of events The probability of^ (A^

∧^ B)^ is^ P^ (A|B)P^ (B) What if^ A^ and^ B^ are independent? Then^ P^ (A|B)^ is^ P^ (A ), and^ P^ (A^ ∧^ B)^ is^ P

(A)P^ (B).

Example:^ What is the probability of “heads” five times in a row?^ What is the probability of at least one “head”?

Department of Computer Science — University of San Francisco – p.4/

??

Bayes’ Rule Often, we want to know how a probability changes as aresult of an observation. Recall the Product Rule:^ P^ (a^ ∧^ b) =^ P^ (a|b)P

(b) P (a ∧ b) = P (b|a)P (a) We can set these equal to each other^ P^ (a|b)P^ (b) =^ P^ (b| a)P^ (a) And then divide by^ P

(a) P (a|b)P (b) P (b|a) = P (a) This equality is known as Bayes’ theorem (or rule orlaw).

Department of Computer Science — University of San Fra

Monty Hall Problem rom the game show “Let’s make a Deal” Pick one of three doors. Fabulous prize behind onedoor, goats behind other 2 doors. Monty opens one of the doors you did not pick, shows agoat Monty then offers you the chance to switch doors, to theother unopened door Should you switch?

Department of Computer Science — University of San Francisco – p.6/

??

Monty Hall Problem Problem Clarification: Prize location selected randomly Monty always opens a door, allows contestants to switch When Monty has a choice about which door to open, hechooses randomly.Variables^ Prize:^ P^ =^ p

,^ p,^ pABC Choose: C = c,^ c,^ cAB^ C Monty: M = m,^ m,^ mAB^ C^ Department of Computer Science — University of San Francisco – p.7/

??

Monty Hall Problem Without loss of generality, assume: Choose door A Monty opens door B P (p|c, m) =?AAB^

Department of Computer Science — University of San Fra

Monty Hall Problem ithout loss of generality, assume: Choose door A Monty opens door B (p|c, m) =^ P^ (m|cAAB^ B^ A

P^ (p|c)AA, p) (^) A P^ (m|c)B^ A^ Department of Computer Science — University of San Francisco – p.9/

??

Monty Hall Problem P (p|c, m) =^ P^ (m|cAAB^ B^ A

P^ (p|c)AA, p) (^) A P^ (m|c)B^ A P^ (m|c, p) =^ ?B^ AA

Department of Computer Science — University of San Francisco – p.10/

??

Monty Hall Problem P (p|c, m) =^ P^ (m|cAAB^ B^ A

P^ (p|c)AA, p) (^) A P^ (m|c)B^ A P^ (m|c, p) = 1/^2 B^ AA P^ (p|c) =?AA

Department of Computer Science — University of San Fran

Probabilistic Inference Problems in working with the joint probability distribution:^ Exponentially large table. We need a data structure that captures the fact thatmany variables do not influence each other.^ For example, the color of Bart’s hat does notinfluence whether Homer is hungry. We call this structure a

Bayesian network^ (or a

belief network )

Department of Computer Science — University of San Francisco – p.18/

??

Bayesian Network A Bayesian network is a directed graph in which eachnode is annotated with probability information. Anetwork has:^ A set of random variables that correspond to thenodes in the network.^ A set of directed edges or arrows connecting nodes.These represent influence. In there is an arrow fromX to Y, then X is the parent of Y.^ Each node keeps a conditional probability distributionindicating the probability of each value it can take,conditional on its parents values.^ No cycles. (it is a directed acyclic graph) The topology of the network specifies what variablesdirectly influence other variables. (conditionalindependence relationships).

Department of Computer Science — University of San Francisco – p.19/

??

Burglary example P ( EP ( B )^ .002^ .001^ BEP ( A ) TT^ .95TFFT.29FF.

) Earthquake Alarm MaryCalls Burglary^ A^ P ( J )^ T.90JohnCallsF.

Two neighbors will callwhen they hear youralarm.^ John sometimesoverreacts^ Mary sometimesmisses the alarm. .94 Two things can set offthe alarm A P ( M )^ Earthquake T.70F.01^ Burglary Given^ who has^ called,what’s the probability of aburglary? Department of Computer Science — University of San Fran

Network structure Each node has a conditional probability table. This gives the probability of each value of that node,given its parents’ values. These sum to 1. Nodes with no parents just contain priors.

Department of Computer Science — University of San Francisco – p.21/

??

Summarizing uncertainty Notice that we don’t need to have nodes for all thereasons why Mary might not call.^ A probabilistic approach lets us summarize thisinformation in^ ¬M This allows a small agent to deal with large worlds thathave a large number of possibly uncertain outcomes. How would we handle this with logic?

Department of Computer Science — University of San Francisco – p.22/

??

Implicitly representing the full JPD Recall that the full joint distribution allows us to calculatethe probability of any variable, given all the others. Independent events can be separated into separatetables. These are the CPTs seen in the Bayesian network. Therefore, we can use this info to perform computations. P^ (x, x, ..., x) = ΠP^12 n

(x|parents(x))ii P^ (A^ ∧ ¬E^ ∧ ¬B^ ∧^ J^ ∧

M^ ) =

P^ (J|A)P^ (M^ |A)P^ (A|¬

B^ ∧ ¬E)P^ (¬B)P^ (¬E

0.^90 ∗^0.^70 ∗^0.^001 ∗^0.

999 ∗^0 .998 = 0.^00062 Department of Computer Science — University of San Fran

Some examples What is the probability that Both Mary and John call,given that the alarm sounded?^ P^ (M^ |A)^ ∗^ P^ (J|A) =

.^90 ∗^ .70 = 0.^63

What is the probability of a breakin, given that we hearan alarm?^ P^ (B|A) = 0.95 + 0

.^001

What is the probability of a breakin given that both Johnand Mary called?^ P^ (B|J, M^ ) =^ P^ (B^ ∧^ J^ ∧^ M

∧^ A^ ∧ ¬E)^ ∨^ P^ (B^ ∧^ J^ ∧^ M^ ∧ A^ ∧^ E)^ ∨^ P^ (B^ ∧^ J^ ∧ M^ ∧ ¬A^ ∧ ¬E)P^ (B^ ∧^ J^ ∧^ M^ ∧ ¬

A^ ∧^ E) This last example shows a form of

inference.^ Department of Computer Science — University of San Francisco – p.24/

??

Constructing a Bayesian network There are often several ways to construct a Bayesiannetwork. The knowledge engineer needs to discover

conditional independence^ relationships. Parents of a node should be those variables that directlyinfluence its value.^ JohnCalls is influenced by Earthquake, but notdirectly.^ John and Mary calling don’t influence each other. Formally, we believe that: P^ (M aryCalls|JohnCalls, Alarm, Earthquake, Burglary

P^ (M aryCalls|Alarm

)^ Department of Computer Science — University of San Francisco – p.25/

??

Compactness and Node Ordering Bayesian networks allow for a more compactrepresentation of the domain^ Redundant information is removed. Example: Say we have 30 nodes, each with 5 parents. Each CPT will contain

5 2 = 32^ numbers Total: 960. (^30) Joint: 2 entries, nearly all redundant.

Department of Computer Science — University of San Fran

Building a network Begin with root causes Then add the variables they directly influence. Then add their direct children. This is called a^ causal model^ Reasoning in terms of cause and effect Estimates are much easier to come up with this way. We could try to build from effect to cause, but thenetwork would be more complex, and the CPTs hard toestimate.^ P^ (E|B) =?

Department of Computer Science — University of San Francisco – p.27/

??

Conditional Independence Recall that conditional independence means that twovariables are independent of each other, given theobservation of a third variable. P^ (a^ ∧^ b|c) =^ P^ (a|c)P^ (

b|c) A node is conditionally independent of itsnondescendants, given its parents.^ Given Alarm, JohnCalls is independent of Burglaryand Earthquake. A node is conditionally independent of all other nodes,given its parents, children, and siblings (the children’sother parents).^ Burglary is independent of JohnCalls given Alarmand Earthquake.

Department of Computer Science — University of San Francisco – p.28/

??

Conditional Independence.. .U^ U^1 m^ X^ Z^ Yn^...

nj Y 1 Z^1 j

UU^1 m...^ X Z^ Z 1 jnj^ Y^ Yn^1^...^ Department of Computer Science — University of San Fran

Variable Elimination P^ (A|B, E)^ will result in a 2 x 2 x 2 matrix, stored in

F.A

We then need to compute the sum over the differentpossible values of^ A∑ This sum is^ f(A, B, EA^ a^

)f(A)f(A)J^ M^ We can process^ E^ the same way. In essence, we’re doing dynamic programming here,and exploiting the same memoization process. Complexity^ Polytrees: (one undirected path between any twonodes) - linear in the number of nodes.^ Multiply-connected nodes: Exponential.

Department of Computer Science — University of San Francisco – p.36/

??

Scaling Up Many techniques have been developed to allowBayesian networks to scale to hundreds of nodes. Clustering^ Nodes are joined together to make the network into apolytree.^ CPT within the node grows, but network structure issimplified. Approximating inference^ Monte Carlo sampling is used to estimate conditionalprobabilities.

Department of Computer Science — University of San Francisco – p.37/

??

Monte Carlo sampling example^ P ( C ) = .5^ Cloudy^ C^ P ( R ) C P ( S )^ t^ .80 t .10^ RainSprinklerf .20 f .50^ WetGrass^ S^ RP ( W ) t^ t .99 t^ f^ .90 f^ t .90 f^ f.

Start at the top of the network and select arandom sample. (say it’s^ true). Draw a random sample from its children.conditioned on^ true.^ P^ (Sprinkler|Cloudy^ =^ true

)^ = <0.1,0.9>. Say we select F alse P (Rain|Cloudy = true) = <0.8, 0.2>. Saywe select T rue P (W etGrass|Sprinkler = f alse, Rain^ = true) = <0.9, 0.1>. Say we select^ true. This gives us a sample for <cloudy,

¬^ Sprinkler, Rain, WetGrass> As we increase the number of samples, thisprovides an estimate of P^ (cloudy,^ ¬Sprinkler, Rain, W etGrass

). We choose the query we are interested in andthen sample the network “enough” times to de-termine the probability of that event occurring,^ Department of Computer Science — University of San Fran

Applications of Bayesian Networks Diagnosis (widely used in Microsoft’s products) Medical diagnosis Spam filtering Expert systems applications (plant control, monitoring) Robotic control

Department of Computer Science — University of San Francisco – p.39/

??