





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Weekly notes for EE2012 2013/14, covering topics on conditional probability and independence in probability theory. the concept of conditional probability, its calculation, and its relationship with the sample space. It also discusses the concept of independence between two or more events and its implications. examples and formulas to help illustrate the concepts.
Typology: Lecture notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






Book sections covered this week: 2.4 – 2.6.2, 2.6.4.
A central application of probability theory is to answer questions of the form “If event B occurs, how does that affect the probability of A occurring?” For instance,
Conditional probability captures this change in probability of event A af- ter finding out that B occurred in an experiment. Almost always, it is not that B actually occurred, but that we hypothesize that it did. In other words, we ask “what if B occurs, then what will happen to our belief that A occurred in the same experiment”? It is as if we are given a clue as to the outcome obtained that does not pinpoint the exact outcome, but only that it belongs in a certain set B.
Example 1: Suppose we have the sample space S = { 1 , 2 , 3 , 4 , 5 , 6 }, where the elementary events are equi-probable. Let
A = { 2 , 4 , 6 } (1) B = { 4 , 5 , 6 } (2) C = { 1 , 2 }. (3)
If we assume or know that B occurred, then the sample space effectively shrinks to B because the outcome must come from B. Then the so-called conditional probability of A given B is denoted P (A|B) and must be
Before assuming/knowing B occurred, then P (A) = 12 , so the information that B occurred affected the probability of A. To compute another conditional probability, say P (C|A), we would also find the number of elements in A ∩ C, and then divide it by the cardinality of A which is the new sample space. Therefore
which happens to be equal to P (C), the original probability of C. So we see that information that an event occurs does not always have an impact on the probability of another event occurring. This is another critical concept called “independence” that we will introduce in the next section.
In general, for repeatable experiments, the relative frequency of A conditioned on B is
fA|B (n) =
nA∩B nB
nA∩B /n nB /n
fA∩B (n) fB (n)
By extension, for non-repeatable experiments, we can define
From (5), we have the important expressions
P (A ∩ B) = P (A|B)P (B) = P (B|A)P (A). (6)
These are so useful and important in probabilistic applications that we must commit them to memory.
As in the previous section, let B 1 ,... , Bn partition S. Suppose we are interested only in P (Bj |A) but instead know P (A|Bk), k = 1,... , n. Bayes’ Rule, derived almost trivially from (5), (6) and (10), says that
P (Bj |A) =
P (A|Bj )P (Bj ) ∑n k=1 P^ (A|Bk)P^ (Bk)^
This little equation is one of the most important results in engineering applications of probability, as illustrated in the next example.
Example 3: This is an example from the medical world. Suppose there is a rare incurable disease that strikes 1 in 10^5 people on average. A test for the disease is 98 percent accurate, meaning that the probability of a false positive is 0.02, and the probability of a false negative is 0.02. If a patient tests positive, what is the probability that he has the disease? At first glance, given that the test seems pretty accurate, the patient would appear to be doomed. But let’s be more careful. Let A = “Patient has the disease” and B = “Test is positive”. What we want is P (A|B), but what we have are
P (B|A) = 0. 98 and P (B|Ac) = 0. 02. (14)
We also know that P (A) = 10−^5 and so P (Ac) = 1 − 10 −^5. Applying Bayes’ Rule, we get
P (A|B) =
This is rather surprising! A test that only fails 2 percent on average turns out to be quite useless in predicting the presence of a disease. The reason is that the false alarm rate of the test is much higher than the likelihood of having the disease. Therefore, a good test for a very rare disease must be much more accurate than one for a common disease.
2 Independence of Events
Independence between two events is a concept that is easily understood:
On the other hand, some events are clearly not independent:
You should be able to see that the concept of independence is closely tied to the notion of conditional probability just introduced in the last section. To be precise, two events A and B are said to be independent, if and only if
P (A|B) = P (A) (16)
and, equivalently, P (B|A) = P (B). So knowing that A occurred does not change our belief (or certainty) that B occurred, and vice versa. Equation (16) also means that
P (A ∩ B) = P (A)P (B) (17)
which is a necessary and sufficient condition for independence between two events A and B.
If we say that A, B and C are independent, what do we mean? It must be that given, say B ∩ C, the probability of A is unchanged, i.e.
P (A|B ∩ C) = P (A). (18)
The events must of course also be pairwise independent, i.e. P (A|B) = P (A), etc. In other words, given the occurrence of some combination of the other two events (union or intersection), or of one of the other events, the probability of A, B or C remains the same. So in order to say that three events are independent requires not only that they are pairwise independent, but that
P (A|B ∩ C) = P (A), i.e. P (A ∩ B ∩ C) = P (A)P (B)P (C). (19)
This result generalizes as follows: A collection of n events A 1 ,... , An are inde- pendent if and only if
P
i∈I
Ai
i∈I
P [Ai], (20)
for all index sets I ⊂ { 1 , 2 ,... , n}.
A Bernoulli trial is an experiment that has a sample space with two elements. We can call these two possible outcomes success/failure, up/down, head/tail, black/white, or anything else that suits the situation. For concreteness we will use the suc- cess/failure terminology from now on. Suppose the probability of success is p. A very important experiment that models many problems consists of performing a fixed number n of independent and identical Bernoulli trials, i.e. every trial has the same probability of success p. For simplicity of notation, let a “success” be denoted by 1, and a “failure” by 0. If n = 3, we have
P [{ 000 }] = (1 − p)^3 (26) P [{ 001 }] = (1 − p)^2 p (27) P [{ 110 }] = (1 − p)p^2 , etc. (28)
because of the independence of the Bernoulli sub-experiments. Then the probability of 2 successes is
P [“Two successes”] = P [{ 011 , 101 , 110 }] = 3p^2 (1 − p) (29)
because each of the outcomes in the event has the same probability of p^2 (1 − p). In general, since the number of ways to obtain k successes in n trials is
(n k
, we have the probability of k successes in n independent identical Bernoulli trials as
pn(k) =
n k
pk(1 − p)n−k, k = 0, 1 ,... , n. (30)
This is a very important probability law, called the Binomial probability law.
Example 5: A student did not study for a multiple-choice exam with 10 questions and 4 choices in each question. Find the probability that he answers five or more questions correctly if he randomly picks one of the four choices in each question. We can reasonably assume that P [“correct answer in i-th question”] = 0.25. The probability of getting k answers correct is given by the binomial law as
pn(k) =
k
Therefore the probability of passing is
∑^10
k=
k
This is rather low, and would become even lower if the number of choices is increased of course.
We are often interested in the number of independent Bernoulli trials needed before encountering the first success, for instance:
Suppose success in the k-th trial is denoted by Ak. Then the event “First success at the m-th trial” is equivalent to failing the first m − 1 times, and then succeeding in the m-th time, i.e. Ac 1 Ac 2 · · · Acm− 1 Am, where we have dispensed with the intersection notation for convenience. But since the trials are independent and identical sub-experiments, we can write
P [Ac 1 Ac 2 · · · Acm− 1 Am] = P [Am]
m∏− 1
i=
P [Aci ] = p(1 − p)m−^1 (33)
if p is the probability of success in any one of the Bernoulli trials. Hence the probability of the first success in a sequence of Bernoulli trials occurring at the m-th trial is pm = p(1 − p)m−^1 , m = 1, 2 ,... (34)
This is known as the geometric probability law. The probability of the first success occurring later than the N -th trial is given by a simple expression:
m=N +
p(1 − p)m−^1 (35)
= p
k=N
(1 − p)k^ (36)
= p
(1 − p)N 1 − (1 − p)
= (1 − p)N^. (38)
Example: Let p = 10−^6 , and then we can answer the question at the top of this sub-section. The event “win after fewer than 20 tries” is { 1 , 2 ,... , 19 }, which has