Joint Probability Distribution, Notation - Review Sheet | CSE 571, Study notes of Computer Science

Material Type: Notes; Class: Artificial Intelligence; Subject: Computer Science and Engineering; University: Arizona State University - Tempe; Term: Spring 2002;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-z67
koofers-user-z67 🇺🇸

5

(1)

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CSE571 Class Notes 11/3/02 Geoff Beerbower
Things to come:
Progol (learning in AnsProlog)
Bayes Nets (reasoning with polytrees)
Functional Causal Models
Adding Probability to AnsProlog
Learning Causality
Learning Bayes Nets
Some background
1st non-monotonic papers appeared around 1980.
Around the same time, some people thought of including probabilityies in
reasoning models.
oe.g. “normally birds fly” could be written as “P(fly|bird) = 0.98”
The idea didn’t take hold, partly because of the difficulty of obtaining the
probability values.
Things have changed, and it is now much easier to find the probability values
(e.g. on the internet)
Judea Pearl’s book about Probabilistic Reasoning has become the foundation of
using probability in AI
Joint Probability Distribution
a b c P(a,b,c)
T T T 0.1
T T F 0.2
T F T 0.1
T F F 0.3
F T T 0.05
F T F 0.05
F F T 0.2
F F F 0
total: 1.0
Notation:
P(A^B) means the probability of A and B
P(A, B) also means the probability of A and B
P(A|B) means the probability of A given B
pf3
pf4
pf5

Partial preview of the text

Download Joint Probability Distribution, Notation - Review Sheet | CSE 571 and more Study notes Computer Science in PDF only on Docsity!

Things to come:  Progol (learning in AnsProlog)  Bayes Nets (reasoning with polytrees)  Functional Causal Models  Adding Probability to AnsProlog  Learning Causality  Learning Bayes Nets Some background  1 st^ non-monotonic papers appeared around 1980.  Around the same time, some people thought of including probabilityies in reasoning models. o e.g. “normally birds fly” could be written as “P(fly|bird) = 0.98”  The idea didn’t take hold, partly because of the difficulty of obtaining the probability values.  Things have changed, and it is now much easier to find the probability values (e.g. on the internet)  Judea Pearl’s book about Probabilistic Reasoning has become the foundation of using probability in AI Joint Probability Distribution a b c P(a,b,c) T T T 0. T T F 0. T F T 0. T F F 0. F T T 0. F T F 0. F F T 0. F F F 0 total: 1. Notation: P(A^B) means the probability of A and B P(A, B) also means the probability of A and B P(A|B) means the probability of A given B

Formula: P(A,B) = P(A)P(B|A) Meaning: The probability of A and B is equal to the probability of A times the probability of B given A This formula is often useful when rewritten as: P(B|A) = P(A,B)/P(A) assuming P(A) is not equal to zero. Example : Question: What is the probability of b being true given that a is true? or P( b = T | a = T ) =? (Note: This is not the same as “What is the probability of both a and b being true?”) Solution: Use the formula ‘P(B|A) = P(A,B)/P(A)’ and the table above. P( b = T | a = T ) = P( a = T , b = T ) P( a = T) = P( a = T , b = T, c = T ) + P( a = T , b = T, c = F ) Σ b,c P( a = T, b, c) = (0.1) + (0.2) (0.1) + (0.2) + (0.1) + (0.3) = 3/ Formula: P(B|A) = P(B)P(A|B) / P(A) Meaning: The probability B given A is equal to the probability of B times the probability of A given B divided by the probability of A. This formula is useful when you are given P(A|B) and want to switch to P(B|A). examples of variables:  a = bird, b = flies  a = symptom, b = disease o P(a|b) - the probability that someone with the disease b has the symptom a. o P(b|a) - the probability that someone with the symptom a has the disease b.

Definition: A directed acyclic graph D over a finite set V is a graph with all edges being directed, no multiple edges, and no directed cycles. Absence of directed cycles means that, following arrows in the graph, it is impossible to return to any point Given  Node: X  Parent nodes of X written as: Pa(X) = {Y 1 , Y 2 , Y 3 , …Yk}  Non-child nodes: Z Then: P(X | Y 1 , Y 2 , Y 3 , …Yk, Z) = P(X | Y 1 , Y 2 , Y 3 , …Yk) Y 1 Y 2 Y 3 Yk Z X Z If we can identify X’s parents, then P(A 1 ,A 2 ,A 3 ,…An) = P(A 1 |A 2 ,A 3 ,…An) … P(Ak|Ak+1,…An) … P(An-1|An)P(An) can be written as P(A 1 ,A 2 ,A 3 ,…An) = P(A 1 |Pa(A 1 )) P(A 2 |Pa(A 2 )) … P(Ak|Pa(Ak)) … P(An-1|Pa(An-1))P(An) In order to do this we must order the variables so that the parents of A 1 are a subset of {A 2 ,A 3 ,A 4 …An}, and the parents of A 2 are a subset of {A 3 ,A 4 ,A 5 …An}, etc. An .

. Pa(A 3 ) is subset of^ Pa(A 2 ) is subset of^ Pa(A 1 ) is subset of . A 4 A 3 A 2 A 1

Bayes Nets (Bayesian Networks)  using a graph and several small joint probability tables, we can create a Bayesian network. An An- An-2 An- A 2 A 3 A 1  for nodes without parents, we need individual probability values.  for all other nodes, we use these small joint probability tables. Question – How do we write something like: P( A 2 , A 5 | A 1 , A 7 ) Solution 1  The simplest way is to use the formula: P(B|A) = P(A,B)/P(A)  That would give us: = P( A 2 , A 5 | A 1 , A 7 ) P( A 1 , A 7 ) = Σ 3,4,6,8,…n P(A 1 ,A 2 ,A 3 ,…An) Σ 2,3,4,5,6,8,…n P(A 1 ,A 2 ,A 3 ,…An)  Drawback: the number of summations can be very large.  In some cases, this method is the best you can do.

A 2 A 3 P(A 1 | A 2 , A 3 )

T T

T F

F T

F F