Understanding Conditional Probability: Definition, Rules, and Applications - Prof. Dilip S, Study notes of Statistics

An in-depth exploration of conditional probability, its definition, consistency with various models, axioms, rules, and applications. The chain rule or product rule, the theorem of total probability, and examples such as the birthday surprise problem and the theorem of total probability. It also discusses the importance of conditional probabilities in probabilistic analyses.

Typology: Study notes

Pre 2010

Uploaded on 02/24/2010

koofers-user-6et
koofers-user-6et 🇺🇸

9 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ECE 313 — Probability with Engineering Applications Fall 2000
Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign 13.1
ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 1 of 39
Introduction
lThe conditional probability of an a event B
given that event A occurred is our revised
estimate of the chances that B occurred in
light of partial knowledge of the outcome
of the experiment, viz. knowing that A
occurred
lTo avoid trivialities, we assume that A,
sometimes called the conditioning event,
has nonzero probability
ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 2 of 39
Definition of conditional probability
lThe conditional probability of B given A is
denoted by P(B|A)
lRead this as “the probability of B given A”
or “the probability of B conditioned on A”
lDefinition: If P(A) > 0, P(B|A) is defined as
P(B|A) = P(AB)
P(A)
lP(B|A) can be larger than, smaller than, or
the same as P(B)
ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 3 of 39
Consistent with various models
lThe definition of conditional probability is
consistent with
nclassical approach to probability
nrelative frequency approach
lConditional probabilities can also be
discussed for events defined in terms of
random variables
lP{X = k | X > n}? or P{X k | a < X < b}?
ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 4 of 39
Geometric RVs are memoryless
lLet X denote a geometric random variable
with parameter p
lFor k > 0, P{X = k+r | X > r} = P{X = k}
lGiven that the event {X > r} has occurred,
that is, the first r trials ended in a “failure”,
the probability that we need to wait for an
additional k trials to observe the first
success is the same as P{X = k}
lIt’s as if the first r trials are forgotten!
ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 5 of 39
Binomial random variables
lLet X denote a binomial random variable
with parameters (n, p)
lGIven the event {X = k} has occurred, the
conditional probability that the j-th trial
resulted in a success is k/n, independent
of the value of p
lThe conditional probability of successes
on the i-th and j-th trials is k(k–1)/[n(n–1)]
land so on
ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 6 of 39
Axioms are satisfied
lConditional probabilities are a probability
measure, that is, they satisfy the axioms of
probability theory
lAll the consequences of the axioms (rules
of probability) also apply to conditional
probabilities
lCaveat: Everything must be conditioned
on the same event. No mixing and
matching allowed
pf3
pf4
pf5

Partial preview of the text

Download Understanding Conditional Probability: Definition, Rules, and Applications - Prof. Dilip S and more Study notes Statistics in PDF only on Docsity!

Department of Electrical and Computer Engineering

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 1 of 39

Introduction

l The conditional probability of an a event B given that event A occurred is our revised estimate of the chances that B occurred in light of partial knowledge of the outcome of the experiment, viz. knowing that A occurred l To avoid trivialities, we assume that A, sometimes called the conditioning event, has nonzero probability

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 2 of 39

Definition of conditional probability

l The conditional probability of B given A is

denoted by P(B|A)

l Read this as “the probability of B given A” or “the probability of B conditioned on A”

l Definition: If P(A) > 0, P(B|A) is defined as

P(B|A) =

P(AB)

P(A)

l P(B|A) can be larger than, smaller than, or

the same as P(B) ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 3 of 39

Consistent with various models

l The definition of conditional probability is consistent with n classical approach to probability n relative frequency approach l Conditional probabilities can also be discussed for events defined in terms of random variables

l P{ X = k | X > n}? or P{ X ≤ k | a < X < b}?

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 4 of 39

Geometric RVs are memoryless

l Let X denote a geometric random variable with parameter p

l For k > 0, P{ X = k+r | X > r} = P{ X = k}

l Given that the event { X > r} has occurred, that is, the first r trials ended in a “failure”, the probability that we need to wait for an additional k trials to observe the first success is the same as P{ X = k} l It’s as if the first r trials are forgotten!

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 5 of 39

Binomial random variables

l Let X denote a binomial random variable with parameters (n, p) l GIven the event { X = k} has occurred, the conditional probability that the j-th trial resulted in a success is k/n, independent of the value of p l The conditional probability of successes on the i-th and j-th trials is k(k–1)/[n(n–1)] l and so on

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 6 of 39

Axioms are satisfied

l Conditional probabilities are a probability measure, that is, they satisfy the axioms of probability theory l All the consequences of the axioms (rules of probability) also apply to conditional probabilities l Caveat: Everything must be conditioned on the same event. No mixing and matching allowed

Department of Electrical and Computer Engineering

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 7 of 39

Rules? What rules?

l P(Ω|A) = 1 l P(∅|A) = 0

l P(Bc|A) = 1 – P(B|A)

l If B ⊂ C, then P(B|A) ≤ P(C|A)

l If BC = ∅, then

P((B ∪ C)|A) = P(B|A) + P(C|A)

l More generally,

P((B ∪ C)|A) = P(B|A) + P(C|A) – P(BC|A)

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 8 of 39

Left side versus right side

l An expression such as P((B ∪ C)|(A ∪ D))

is commonly written as P(B ∪ C|A ∪ D)

l Everything to the right of the vertical bar is the conditioning event; it is a single set l Everything to the left of the vertical bar is the conditioned event; it is a single set l Even if A, B, C, and D are disjoint,

P(B ∪ C|A ∪ D) ≠ P(B) + P(C|A) +P(D)

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 9 of 39

Is that all there is to it?

l OK, so you can update your probabilities to conditional probabilities if you know that event A occurred n Is that all there is to it? n Is the notion of conditional probability just a one-trick pony? n Surely life holds more than that? l Actually, conditional probabilities are fundamental tools in probabilistic analyses

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 10 of 39

The chain rule or product rule

l P(B|A) = P(AB)/P(A)

l P(AB) = P(B|A)P(A)

l Note that P(AB) can also be expressed as

P(A|B)P(B)

l The conditional probability P(B|A) can be

used to compute the joint probability P(AB)

l Conditional probability P(B|A) times P(A),

the probability of the conditioning event ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 11 of 39

Generalization of the chain rule

l More generally,

P(ABCD…)=P(A)P(B|A)P(C|AB)P(D|ABC)…

l Product of first two terms is P(AB)

l P(C|AB)P(AB) = P(ABC), so that the

product of the first three terms is P(ABC), and so on … l For ABCD… to occur, A must occur, and if A has occurred, so must B (with probability

P(B|A)); if both A and B, then C must …

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 12 of 39

Applications of the chain rule

l Example: A random sample of size k is drawn without replacement from the set {1, 2, … , n}. What is the probability that the sample is exactly {1, 2, 3, … , k–1, n}?

l Simple answer: There are equally likely subsets that could have been drawn, and so the desired probability is just

n k

n k

Department of Electrical and Computer Engineering

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 19 of 39

Further generalization of the chain rule

l P(ABCD…)

= P(A)P(B|A)P(C|AB)P(D|ABC)…

l Every probability result also applies to conditional probabilities l The chain rule applies to computation of conditional probabilities by conditioning everything on the given event H (say)

l P(ABCD… |H)

=P(A|H)P(B|AH)P(C|ABH)P(D|ABCH)…

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 20 of 39

It’s not just for breakfast any more!

l P(AB) + P(ABc) = P(A)

l P(AB)= P(A|B)P(B)

l P(ABc) = P(A|Bc)P(Bc)

l Hence, P(A) = P(A|B)P(B) + P(A|Bc)P(Bc)

and P(B) = P(B|A)P(A) + P(B|Ac)P(Ac)

l These formulas are totally unlike the ones seen previously

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 21 of 39

It’s not ‘the same thing, only different…’

P(A) = P(A|B)P(B) + P(A|Bc)P(Bc)

P(B) = P(B|A)P(A) + P(B|Ac)P(Ac)

l These formulas are totally unlike the ones seen previously l On the right side, we have probabilities conditioned on different events l Previously, we were conditioning on the same event throughout

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 22 of 39

…it’s something much much more!

P(A) = P(A|B)P(B) + P(A|Bc)P(Bc)

l B and Bc^ cannot occur simultaneously on the same trial l To find P(A), first imagine that B occurred

l From P(A|B), we can determine P(AB)

l Next imagine that Bc^ occurred

l From P(A|Bc), we can determine P(ABc)

l The sum of these two numbers is P(A)! ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 23 of 39

Oatmeal or haute cuisine?

P(A) = P(A|B)P(B) + P(A|Bc)P(Bc)

l We knew how to obtain conditional probabilities from “regular” probabilities

l P(A|B) = P(AB)/P(B)

l New result allows us to find unconditional probabilities from conditional probabilities l It is a fundamentally important result l It is also very simple (uses horse sense) ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 24 of 39

The theorem of total probability

P(A) = P(A|B)P(B) + P(A|Bc)P(Bc)

l This fundamental result is called the theorem of total probability l The probability of the event A is the weighted average of the probabilities of A conditioned on B and on Bc l In the Ross textbook, this result is Eq.(3.1) on page 72

Department of Electrical and Computer Engineering

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 25 of 39

Applications

l Example: Box I has 3 green and 2 red balls, while Box II has 2 green and 2 red balls. A ball is drawn at random from Box I and transferred to Box II. Then, a ball is drawn at random from Box II. What is the probability that the ball drawn from Box II is green? l Note that the color of the ball transferred from Box I to Box II is not known ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 26 of 39

Example (continued)

l The color of the ball transferred is not known, but it’s either green or red for sure!

Box I

Box II

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 27 of 39

Example (continued)

l Box I has 3g, 2r; Box II has 2g, 2r l After the transfer, Box II has 5 balls in it l G = event ball drawn from Box II is green l A = event ball transferred is red

l P(G|A) = 2/5 l P(G|Ac) = 3/

l P(A) = 2/

l P(G) = P(G|A)P(A) + P(G|Ac)P(Ac)

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 28 of 39

A built-in test for checking answers

l The probability of event A is the weighted

average of P(A|B) and P(A|Bc)

l P(A) = P(A|B)P(B) + P(A|Bc)P(Bc)

= P(A|B)P(B) + P(A|Bc)[1 – P(B)]

l The linear function y = a•x + b•(1 – x) has value b at x = 0 and a at x = 1 l For 0 < x < 1, y is between a and b

l P(A) is between P(A|B) and P(A|Bc)

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 29 of 39

Example (checking our work)

l P(G|A) = 2/5 l P(G|Ac) = 3/

l P(G) = P(G|A)P(A) + P(G|Ac)P(Ac)

P(G|A) = 2/5 ≤ P(G) = 13/25 ≤ P(G|Ac) = 3/

l If the check is satisfied, it does not imply that your work is right; there may be other mistakes, e.g. you computed P(G) = 12/ l But, if the check is not satisfied, … ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 30 of 39

Generalizations of the theorem I

l P(A) = P(A|B)P(B) + P(A|Bc)P(Bc)

l Since conditional probabilities form a probability measure, a similar result also holds for conditional probabilities

l P(A|C) = P(A|BC)P(B|C)+P(A|Bc^ C)P(Bc|C)

l All probabilities in the first equation are now conditioned on C (in addition to any previously existing conditioning)

Department of Electrical and Computer Engineering

ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 37 of 39

Another Example

l You and a friend (also taking ECE 313) are at a party with N–1 other people when suddenly a conga line forms. Assume that all (N+1)! orderings are possible l What is the probability that your friend is ahead of you in the conga line? l Answer: 1/2 (by symmetry) l If there was a different (correct) answer, you would be ahead with same prob ≠ 1/ ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 38 of 39

Do it by the theorem…

l Both you and your friend are equally likely to be anywhere in the conga line l P(you are in j-th position) = 1/(N + 1)

l P(friend ahead|you in j-th) = (j – 1)/N

l Why j–1? Why N and not N+1? l P(friend ahead) = sum of [(j–1)/N]•[1/(N+1)] = [0 + 1 + … + N]/[N•(N + 1)] = 1/ l 1 + 2 + … + N = N•(N + 1)/2 !!!! ECE 313 - Lecture 13 © 2000 Dilip V. Sarwate, University of Illinois at Urbana-Champaign, All Rights Reserved Slide 39 of 39

Summary

l The chain rule or product rule allows us to compute a joint probability (i.e. probability of an intersection) as the product of various conditional probabilities l The theorem of total probability allows us to find an unconditional probability from conditional probabilities l We discussed some examples of the applications of these rules