Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Understanding Conditional Probability and Bayesian Networks in Probabilistic Inference - P, Study notes of Computer Science

University of San Francisco (USF)Computer Science

Prof. Christopher H. Brooks

This document from the university of san francisco's department of computer science explores the concepts of probabilistic inference and the monty hall problem. It covers the basics of probability, conditional probability, and the monty hall problem, providing examples and formulas to help understand these concepts. The document also touches upon the use of bayesian networks for probabilistic inference.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-okv 🇺🇸

8 documents

1 / 7

This page cannot be seen from the preview

Don't miss anything!

Artificial Intelligence

Programming

Probabilistic Inference

Chris Brooks

Department of Computer Science

University of San Francisco

Probability Review

Probability allows us to represent a belief about a

statement, or a likelihood that a statement is true.

P(rain) = 0.6means that we believe it is 60% likely

that it is currently raining.

Axioms:

0≤P(a)≤1

The probability of (A∨B)is P(A) + P(B)−P(A∧B)

Tautologies have P= 1

Contradictions have P= 0

Departmentof Computer Science — University of San Francisco – p.1/??

Conditional Probability

Once we begin to make observations about the value of

certain variables, our belief in other variables changes.

Once we notice that it’s cloudy, P(Rain)goes up.

this is called conditional probability

Written as: P(Rain|C loudy)

P(a|b) = P(a∧b)

P(b)

or P(a∧b) = P(a|b)P(b)

This is called the product rule.

Departmentof Computer Science — University of San Francisco

Conditional Probability

Example: P(Cloudy) = 0.3

P(Rain) = 0.2

P(cloudy ∧rain) = 0.15

P(cloudy ∧ ¬Rain) = 0.1

P(¬cloudy ∧Rain) = 0.1

P(¬Cloudy ∧ ¬Rain) = 0.65

Initially, P(Rain) = 0.2. Once we see that it’s cloudy,

P(Rain|Cloudy) = P(Rain∧Cloudy)

P(Cloudy)=0.15

0.3= 0.5

Departmentof Computer Science — University of San Francisco – p.3/??

Combinations of events

The probability of (A∧B)is P(A|B)P(B)

What if Aand Bare independent?

Then P(A|B)is P(A), and P(A∧B)is P(A)P(B).

Example:

What is the probability of “heads” five times in a row?

What is the probability of at least one “head”?

Departmentof Computer Science — University of San Francisco – p.4/??

Bayes’ Rule

Often, we want to know how a probability changes as a

result of an observation.

Recall the Product Rule:

P(a∧b) = P(a|b)P(b)

P(a∧b) = P(b|a)P(a)

We can set these equal to each other

P(a|b)P(b) = P(b|a)P(a)

And then divide by P(a)

P(b|a) = P(a|b)P(b)

P(a)

This equality is known as Bayes’ theorem (or rule or

law).

Departmentof Computer Science — University of San Francisco

Discover Study notes of Computer Science University of San Francisco (USF)

Partial preview of the text

Download Understanding Conditional Probability and Bayesian Networks in Probabilistic Inference - P and more Study notes Computer Science in PDF only on Docsity!

Artificial IntelligenceProgramming^ Probabilistic Inference^ Chris BrooksDepartment of Computer ScienceUniversity of San Francisco

Probability Review Probability allows us to represent a belief about astatement, or a likelihood that a statement is true.^ P^ (rain) = 0.^6 means that we believe it is 60% likelythat it is currently raining. Axioms:^0 ≤^ P^ (a)^ ≤^1 The probability of^

(A^ ∨^ B)^ is^ P^ (A) +^ P^ (

B)^ −^ P^ (A^ ∧^ B)

Tautologies have^ P^ = 1 Contradictions have^

P^ = 0^ Department of Computer Science — University of San Francisco – p.1/

??

Conditional Probability Once we begin to make observations about the value ofcertain variables, our belief in other variables changes.^ Once we notice that it’s cloudy,

P^ (Rain)^ goes up. this is called^ conditional probability Written as:^ P^ (Rain|Cloudy

P^ (a∧b) P (a|b) = P^ (b) or P (a ∧ b) =^ P^ (a|b)P (b) This is called the product rule.^ Department of Computer Science — University of San Fra

Conditional Probability Example:^ P^ (Cloudy) = 0

.^3

P^ (Rain) = 0.^2 P^ (cloudy^ ∧^ rain) = 0.

P^ (cloudy^ ∧ ¬Rain) = 0

.^1

P^ (¬cloudy^ ∧^ Rain) = 0

.^1

P^ (¬Cloudy^ ∧ ¬Rain) = 0

.^65

Initially,^ P^ (Rain) = 0

.^2. Once we see that it’s cloudy, P^ (Rain|Cloudy) =^ P

(Rain∧Cloudy)^0.^15 =^ = 0^ P^ (Cloudy)^0.^3

.^5 Department of Computer Science — University of San Francisco – p.3/ ??

Combinations of events The probability of^ (A^

∧^ B)^ is^ P^ (A|B)P^ (B) What if^ A^ and^ B^ are independent? Then^ P^ (A|B)^ is^ P^ (A ), and^ P^ (A^ ∧^ B)^ is^ P

(A)P^ (B).

Example:^ What is the probability of “heads” five times in a row?^ What is the probability of at least one “head”?

Department of Computer Science — University of San Francisco – p.4/

??

Bayes’ Rule Often, we want to know how a probability changes as aresult of an observation. Recall the Product Rule:^ P^ (a^ ∧^ b) =^ P^ (a|b)P

(b) P (a ∧ b) = P (b|a)P (a) We can set these equal to each other^ P^ (a|b)P^ (b) =^ P^ (b| a)P^ (a) And then divide by^ P

(a) P (a|b)P (b) P (b|a) = P (a) This equality is known as Bayes’ theorem (or rule orlaw).

Department of Computer Science — University of San Fra

Monty Hall Problem rom the game show “Let’s make a Deal” Pick one of three doors. Fabulous prize behind onedoor, goats behind other 2 doors. Monty opens one of the doors you did not pick, shows agoat Monty then offers you the chance to switch doors, to theother unopened door Should you switch?

Department of Computer Science — University of San Francisco – p.6/

??

Monty Hall Problem Problem Clarification: Prize location selected randomly Monty always opens a door, allows contestants to switch When Monty has a choice about which door to open, hechooses randomly.Variables^ Prize:^ P^ =^ p

,^ p,^ pABC Choose: C = c,^ c,^ cAB^ C Monty: M = m,^ m,^ mAB^ C^ Department of Computer Science — University of San Francisco – p.7/

??

Monty Hall Problem Without loss of generality, assume: Choose door A Monty opens door B P (p|c, m) =?AAB^

Department of Computer Science — University of San Fra

Monty Hall Problem ithout loss of generality, assume: Choose door A Monty opens door B (p|c, m) =^ P^ (m|cAAB^ B^ A

P^ (p|c)AA, p) (^) A P^ (m|c)B^ A^ Department of Computer Science — University of San Francisco – p.9/

??

Monty Hall Problem P (p|c, m) =^ P^ (m|cAAB^ B^ A

P^ (p|c)AA, p) (^) A P^ (m|c)B^ A P^ (m|c, p) =^ ?B^ AA

Department of Computer Science — University of San Francisco – p.10/

??

Monty Hall Problem P (p|c, m) =^ P^ (m|cAAB^ B^ A

P^ (p|c)AA, p) (^) A P^ (m|c)B^ A P^ (m|c, p) = 1/^2 B^ AA P^ (p|c) =?AA

Department of Computer Science — University of San Fran

Probabilistic Inference Problems in working with the joint probability distribution:^ Exponentially large table. We need a data structure that captures the fact thatmany variables do not influence each other.^ For example, the color of Bart’s hat does notinfluence whether Homer is hungry. We call this structure a

Bayesian network^ (or a

belief network )

Department of Computer Science — University of San Francisco – p.18/

??

Bayesian Network A Bayesian network is a directed graph in which eachnode is annotated with probability information. Anetwork has:^ A set of random variables that correspond to thenodes in the network.^ A set of directed edges or arrows connecting nodes.These represent influence. In there is an arrow fromX to Y, then X is the parent of Y.^ Each node keeps a conditional probability distributionindicating the probability of each value it can take,conditional on its parents values.^ No cycles. (it is a directed acyclic graph) The topology of the network specifies what variablesdirectly influence other variables. (conditionalindependence relationships).

Department of Computer Science — University of San Francisco – p.19/

??

Burglary example P ( EP ( B )^ .002^ .001^ BEP ( A ) TT^ .95TFFT.29FF.

) Earthquake Alarm MaryCalls Burglary^ A^ P ( J )^ T.90JohnCallsF.

Two neighbors will callwhen they hear youralarm.^ John sometimesoverreacts^ Mary sometimesmisses the alarm. .94 Two things can set offthe alarm A P ( M )^ Earthquake T.70F.01^ Burglary Given^ who has^ called,what’s the probability of aburglary? Department of Computer Science — University of San Fran

Network structure Each node has a conditional probability table. This gives the probability of each value of that node,given its parents’ values. These sum to 1. Nodes with no parents just contain priors.

Department of Computer Science — University of San Francisco – p.21/

??

Summarizing uncertainty Notice that we don’t need to have nodes for all thereasons why Mary might not call.^ A probabilistic approach lets us summarize thisinformation in^ ¬M This allows a small agent to deal with large worlds thathave a large number of possibly uncertain outcomes. How would we handle this with logic?

Department of Computer Science — University of San Francisco – p.22/

??

Implicitly representing the full JPD Recall that the full joint distribution allows us to calculatethe probability of any variable, given all the others. Independent events can be separated into separatetables. These are the CPTs seen in the Bayesian network. Therefore, we can use this info to perform computations. P^ (x, x, ..., x) = ΠP^12 n

(x|parents(x))ii P^ (A^ ∧ ¬E^ ∧ ¬B^ ∧^ J^ ∧

M^ ) =

P^ (J|A)P^ (M^ |A)P^ (A|¬

B^ ∧ ¬E)P^ (¬B)P^ (¬E

0.^90 ∗^0.^70 ∗^0.^001 ∗^0.

999 ∗^0 .998 = 0.^00062 Department of Computer Science — University of San Fran

Some examples What is the probability that Both Mary and John call,given that the alarm sounded?^ P^ (M^ |A)^ ∗^ P^ (J|A) =

.^90 ∗^ .70 = 0.^63

What is the probability of a breakin, given that we hearan alarm?^ P^ (B|A) = 0.95 + 0

.^001

What is the probability of a breakin given that both Johnand Mary called?^ P^ (B|J, M^ ) =^ P^ (B^ ∧^ J^ ∧^ M

∧^ A^ ∧ ¬E)^ ∨^ P^ (B^ ∧^ J^ ∧^ M^ ∧ A^ ∧^ E)^ ∨^ P^ (B^ ∧^ J^ ∧ M^ ∧ ¬A^ ∧ ¬E)P^ (B^ ∧^ J^ ∧^ M^ ∧ ¬

A^ ∧^ E) This last example shows a form of

inference.^ Department of Computer Science — University of San Francisco – p.24/

??

Constructing a Bayesian network There are often several ways to construct a Bayesiannetwork. The knowledge engineer needs to discover

conditional independence^ relationships. Parents of a node should be those variables that directlyinfluence its value.^ JohnCalls is influenced by Earthquake, but notdirectly.^ John and Mary calling don’t influence each other. Formally, we believe that: P^ (M aryCalls|JohnCalls, Alarm, Earthquake, Burglary

P^ (M aryCalls|Alarm

)^ Department of Computer Science — University of San Francisco – p.25/

??

Compactness and Node Ordering Bayesian networks allow for a more compactrepresentation of the domain^ Redundant information is removed. Example: Say we have 30 nodes, each with 5 parents. Each CPT will contain

5 2 = 32^ numbers Total: 960. (^30) Joint: 2 entries, nearly all redundant.

Department of Computer Science — University of San Fran

Building a network Begin with root causes Then add the variables they directly influence. Then add their direct children. This is called a^ causal model^ Reasoning in terms of cause and effect Estimates are much easier to come up with this way. We could try to build from effect to cause, but thenetwork would be more complex, and the CPTs hard toestimate.^ P^ (E|B) =?

Department of Computer Science — University of San Francisco – p.27/

??

Conditional Independence Recall that conditional independence means that twovariables are independent of each other, given theobservation of a third variable. P^ (a^ ∧^ b|c) =^ P^ (a|c)P^ (

b|c) A node is conditionally independent of itsnondescendants, given its parents.^ Given Alarm, JohnCalls is independent of Burglaryand Earthquake. A node is conditionally independent of all other nodes,given its parents, children, and siblings (the children’sother parents).^ Burglary is independent of JohnCalls given Alarmand Earthquake.

Department of Computer Science — University of San Francisco – p.28/

??

Conditional Independence.. .U^ U^1 m^ X^ Z^ Yn^...

nj Y 1 Z^1 j

UU^1 m...^ X Z^ Z 1 jnj^ Y^ Yn^1^...^ Department of Computer Science — University of San Fran

Variable Elimination P^ (A|B, E)^ will result in a 2 x 2 x 2 matrix, stored in

F.A

We then need to compute the sum over the differentpossible values of^ A∑ This sum is^ f(A, B, EA^ a^

)f(A)f(A)J^ M^ We can process^ E^ the same way. In essence, we’re doing dynamic programming here,and exploiting the same memoization process. Complexity^ Polytrees: (one undirected path between any twonodes) - linear in the number of nodes.^ Multiply-connected nodes: Exponential.

Department of Computer Science — University of San Francisco – p.36/

??

Scaling Up Many techniques have been developed to allowBayesian networks to scale to hundreds of nodes. Clustering^ Nodes are joined together to make the network into apolytree.^ CPT within the node grows, but network structure issimplified. Approximating inference^ Monte Carlo sampling is used to estimate conditionalprobabilities.

Department of Computer Science — University of San Francisco – p.37/

??

Monte Carlo sampling example^ P ( C ) = .5^ Cloudy^ C^ P ( R ) C P ( S )^ t^ .80 t .10^ RainSprinklerf .20 f .50^ WetGrass^ S^ RP ( W ) t^ t .99 t^ f^ .90 f^ t .90 f^ f.

Start at the top of the network and select arandom sample. (say it’s^ true). Draw a random sample from its children.conditioned on^ true.^ P^ (Sprinkler|Cloudy^ =^ true

)^ = <0.1,0.9>. Say we select F alse P (Rain|Cloudy = true) = <0.8, 0.2>. Saywe select T rue P (W etGrass|Sprinkler = f alse, Rain^ = true) = <0.9, 0.1>. Say we select^ true. This gives us a sample for <cloudy,

¬^ Sprinkler, Rain, WetGrass> As we increase the number of samples, thisprovides an estimate of P^ (cloudy,^ ¬Sprinkler, Rain, W etGrass

). We choose the query we are interested in andthen sample the network “enough” times to de-termine the probability of that event occurring,^ Department of Computer Science — University of San Fran

Applications of Bayesian Networks Diagnosis (widely used in Microsoft’s products) Medical diagnosis Spam filtering Expert systems applications (plant control, monitoring) Robotic control

Department of Computer Science — University of San Francisco – p.39/

??

Understanding Conditional Probability and Bayesian Networks in Probabilistic Inference - P, Study notes of Computer Science

Related documents

Partial preview of the text

Download Understanding Conditional Probability and Bayesian Networks in Probabilistic Inference - P and more Study notes Computer Science in PDF only on Docsity!

Artificial IntelligenceProgramming^ Probabilistic Inference^ Chris BrooksDepartment of Computer ScienceUniversity of San Francisco

Probability Review Probability allows us to represent a belief about astatement, or a likelihood that a statement is true.^ P^ (rain) = 0.^6 means that we believe it is 60% likelythat it is currently raining. Axioms:^0 ≤^ P^ (a)^ ≤^1 The probability of^

B)^ −^ P^ (A^ ∧^ B)

Conditional Probability Once we begin to make observations about the value ofcertain variables, our belief in other variables changes.^ Once we notice that it’s cloudy,

Conditional Probability Example:^ P^ (Cloudy) = 0

.^3

.^1

.^1

.^65

Combinations of events The probability of^ (A^

(A)P^ (B).

Bayes’ Rule Often, we want to know how a probability changes as aresult of an observation. Recall the Product Rule:^ P^ (a^ ∧^ b) =^ P^ (a|b)P

Monty Hall Problem Problem Clarification: Prize location selected randomly Monty always opens a door, allows contestants to switch When Monty has a choice about which door to open, hechooses randomly.Variables^ Prize:^ P^ =^ p

Monty Hall Problem Without loss of generality, assume: Choose door A Monty opens door B P (p|c, m) =?AAB^

Monty Hall Problem ithout loss of generality, assume: Choose door A Monty opens door B (p|c, m) =^ P^ (m|cAAB^ B^ A

Monty Hall Problem P (p|c, m) =^ P^ (m|cAAB^ B^ A

Monty Hall Problem P (p|c, m) =^ P^ (m|cAAB^ B^ A

Burglary example P ( EP ( B )^ .002^ .001^ BEP ( A ) TT^ .95TFFT.29FF.

Network structure Each node has a conditional probability table. This gives the probability of each value of that node,given its parents’ values. These sum to 1. Nodes with no parents just contain priors.

M^ ) =

P^ (J|A)P^ (M^ |A)P^ (A|¬

B^ ∧ ¬E)P^ (¬B)P^ (¬E

0.^90 ∗^0.^70 ∗^0.^001 ∗^0.

Some examples What is the probability that Both Mary and John call,given that the alarm sounded?^ P^ (M^ |A)^ ∗^ P^ (J|A) =

.^90 ∗^ .70 = 0.^63

.^001

Constructing a Bayesian network There are often several ways to construct a Bayesiannetwork. The knowledge engineer needs to discover

Compactness and Node Ordering Bayesian networks allow for a more compactrepresentation of the domain^ Redundant information is removed. Example: Say we have 30 nodes, each with 5 parents. Each CPT will contain

Conditional Independence Recall that conditional independence means that twovariables are independent of each other, given theobservation of a third variable. P^ (a^ ∧^ b|c) =^ P^ (a|c)P^ (

Conditional Independence.. .U^ U^1 m^ X^ Z^ Yn^...

Variable Elimination P^ (A|B, E)^ will result in a 2 x 2 x 2 matrix, stored in

F.A

Monte Carlo sampling example^ P ( C ) = .5^ Cloudy^ C^ P ( R ) C P ( S )^ t^ .80 t .10^ RainSprinklerf .20 f .50^ WetGrass^ S^ RP ( W ) t^ t .99 t^ f^ .90 f^ t .90 f^ f.

Applications of Bayesian Networks Diagnosis (widely used in Microsoft’s products) Medical diagnosis Spam filtering Expert systems applications (plant control, monitoring) Robotic control