Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Conditional Probability and Independence in Probability Theory, Lecture notes of Statistics

National University of Singapore Statistics

Weekly notes for EE2012 2013/14, covering topics on conditional probability and independence in probability theory. the concept of conditional probability, its calculation, and its relationship with the sample space. It also discusses the concept of independence between two or more events and its implications. examples and formulas to help illustrate the concepts.

Typology: Lecture notes

2020/2021

Uploaded on 03/25/2021

mike-kern 🇸🇬

5

(1)

5 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Weekly Notes for EE2012 2013/14 – Week 3

T. J. Lim

February 4, 2014

Book sections covered this week: 2.4 – 2.6.2, 2.6.4.

1 Conditional Probability

1.1 Definition

A central application of probability theory is to answer questions of the form “If

event Boccurs, how does that affect the probability of Aoccurring?” For instance,

•A= “Paul committed the crime” and B= “Paul has an alibi”. If at the start

we already know that Paul is a serial offender of this sort of crime and that

he lives in the area, we may say that P(A) is quite large, say 0.5. But after

knowing that he has an alibi e.g. he was seen at another place by a number

of people, i.e. event Boccurred, then P(A) becomes much smaller, probably

0.

•A= “Jim will not feel well tomorrow”, and B= “Jim had 5 beers tonight”.

P(A) would be very different given that Boccurred, compared to the case if

Bdid not occur.

•A= “Waiting time at a clinic is more than 20 minutes” and B= “Doctor

has not arrived yet”. Again, P(A) would be significantly affected by the

occurrence of event B.

Conditional probability captures this change in probability of event Aaf-

ter finding out that Boccurred in an experiment. Almost always, it is not that

Bactually occurred, but that we hypothesize that it did. In other words, we ask

“what if Boccurs, then what will happen to our belief that Aoccurred in the same

experiment”? It is as if we are given a clue as to the outcome obtained that does

not pinpoint the exact outcome, but only that it belongs in a certain set B.

1

Discover Lecture notes of Statistics National University of Singapore

Partial preview of the text

Download Conditional Probability and Independence in Probability Theory and more Lecture notes Statistics in PDF only on Docsity!

Weekly Notes for EE2012 2013/14 – Week 3

T. J. Lim

February 4, 2014

Book sections covered this week: 2.4 – 2.6.2, 2.6.4.

1 Conditional Probability

1.1 Definition

A central application of probability theory is to answer questions of the form “If event B occurs, how does that affect the probability of A occurring?” For instance,

A = “Paul committed the crime” and B = “Paul has an alibi”. If at the start we already know that Paul is a serial offender of this sort of crime and that he lives in the area, we may say that P (A) is quite large, say 0.5. But after knowing that he has an alibi e.g. he was seen at another place by a number of people, i.e. event B occurred, then P (A) becomes much smaller, probably
A = “Jim will not feel well tomorrow”, and B = “Jim had 5 beers tonight”. P (A) would be very different given that B occurred, compared to the case if B did not occur.
A = “Waiting time at a clinic is more than 20 minutes” and B = “Doctor has not arrived yet”. Again, P (A) would be significantly affected by the occurrence of event B.

Conditional probability captures this change in probability of event A af- ter finding out that B occurred in an experiment. Almost always, it is not that B actually occurred, but that we hypothesize that it did. In other words, we ask “what if B occurs, then what will happen to our belief that A occurred in the same experiment”? It is as if we are given a clue as to the outcome obtained that does not pinpoint the exact outcome, but only that it belongs in a certain set B.

Example 1: Suppose we have the sample space S = { 1 , 2 , 3 , 4 , 5 , 6 }, where the elementary events are equi-probable. Let

A = { 2 , 4 , 6 } (1) B = { 4 , 5 , 6 } (2) C = { 1 , 2 }. (3)

If we assume or know that B occurred, then the sample space effectively shrinks to B because the outcome must come from B. Then the so-called conditional probability of A given B is denoted P (A|B) and must be

P (A|B) =

|A ∩ B|

|B|

Before assuming/knowing B occurred, then P (A) = 12 , so the information that B occurred affected the probability of A. To compute another conditional probability, say P (C|A), we would also find the number of elements in A ∩ C, and then divide it by the cardinality of A which is the new sample space. Therefore

P (C|A) =

|A ∩ C|

|A|

which happens to be equal to P (C), the original probability of C. So we see that information that an event occurs does not always have an impact on the probability of another event occurring. This is another critical concept called “independence” that we will introduce in the next section.

In general, for repeatable experiments, the relative frequency of A conditioned on B is

fA|B (n) =

nA∩B nB

nA∩B /n nB /n

fA∩B (n) fB (n)

By extension, for non-repeatable experiments, we can define

P (A|B) =

P (A ∩ B)

P (B)

From (5), we have the important expressions

P (A ∩ B) = P (A|B)P (B) = P (B|A)P (A). (6)

These are so useful and important in probabilistic applications that we must commit them to memory.

1.3 Bayes’ Rule

As in the previous section, let B 1 ,... , Bn partition S. Suppose we are interested only in P (Bj |A) but instead know P (A|Bk), k = 1,... , n. Bayes’ Rule, derived almost trivially from (5), (6) and (10), says that

P (Bj |A) =

P (A|Bj )P (Bj ) ∑n k=1 P^ (A|Bk)P^ (Bk)^

This little equation is one of the most important results in engineering applications of probability, as illustrated in the next example.

Example 3: This is an example from the medical world. Suppose there is a rare incurable disease that strikes 1 in 10^5 people on average. A test for the disease is 98 percent accurate, meaning that the probability of a false positive is 0.02, and the probability of a false negative is 0.02. If a patient tests positive, what is the probability that he has the disease? At first glance, given that the test seems pretty accurate, the patient would appear to be doomed. But let’s be more careful. Let A = “Patient has the disease” and B = “Test is positive”. What we want is P (A|B), but what we have are

P (B|A) = 0. 98 and P (B|Ac) = 0. 02. (14)

We also know that P (A) = 10−^5 and so P (Ac) = 1 − 10 −^5. Applying Bayes’ Rule, we get

P (A|B) =

0. 98 × 10 −^5

0. 98 × 10 −^5 + 0. 02 × (1 − 10 −^5 )

= 4. 9 × 10 −^4. (15)

This is rather surprising! A test that only fails 2 percent on average turns out to be quite useless in predicting the presence of a disease. The reason is that the false alarm rate of the test is much higher than the likelihood of having the disease. Therefore, a good test for a very rare disease must be much more accurate than one for a common disease.

2 Independence of Events

2.1 Concept and Definition

Independence between two events is a concept that is easily understood:

Whether it’s cloudy today has no bearing on how many people are in class tomorrow. “Cloudy today” and “More than 20 people in class tomorrow” are independent events.
You flip a coin, and then roll a die. The outcome of the coin flip should have no influence on the outcome of the die roll. Therefore, “Coin turns up Heads” and “Top face of die is 5” are independent events.

On the other hand, some events are clearly not independent:

“Paul’s dog died yesterday” and “Paul is feeling down today” are not inde- pendent because knowing the former event occurred, the latter event is more likely.
You roll two dice and note both faces on top. The events “The total is 3” and “One of the faces is 6” are not independent because knowing the former implies that the latter did not occur.

You should be able to see that the concept of independence is closely tied to the notion of conditional probability just introduced in the last section. To be precise, two events A and B are said to be independent, if and only if

P (A|B) = P (A) (16)

and, equivalently, P (B|A) = P (B). So knowing that A occurred does not change our belief (or certainty) that B occurred, and vice versa. Equation (16) also means that

P (A ∩ B) = P (A)P (B) (17)

which is a necessary and sufficient condition for independence between two events A and B.

2.2 More Than Two Events

If we say that A, B and C are independent, what do we mean? It must be that given, say B ∩ C, the probability of A is unchanged, i.e.

P (A|B ∩ C) = P (A). (18)

The events must of course also be pairwise independent, i.e. P (A|B) = P (A), etc. In other words, given the occurrence of some combination of the other two events (union or intersection), or of one of the other events, the probability of A, B or C remains the same. So in order to say that three events are independent requires not only that they are pairwise independent, but that

P (A|B ∩ C) = P (A), i.e. P (A ∩ B ∩ C) = P (A)P (B)P (C). (19)

This result generalizes as follows: A collection of n events A 1 ,... , An are inde- pendent if and only if

P

i∈I

Ai

i∈I

P [Ai], (20)

for all index sets I ⊂ { 1 , 2 ,... , n}.

Toss a coin 6 times, and note sequence of heads and tails. Knowledge that the 2nd toss is a heads does not change the probability of the 3rd toss being a tail, etc. Therefore we have a sequence of 6 independent sub-experiments.

3.2 Bernoulli Trials and the Binomial Probability Law

A Bernoulli trial is an experiment that has a sample space with two elements. We can call these two possible outcomes success/failure, up/down, head/tail, black/white, or anything else that suits the situation. For concreteness we will use the suc- cess/failure terminology from now on. Suppose the probability of success is p. A very important experiment that models many problems consists of performing a fixed number n of independent and identical Bernoulli trials, i.e. every trial has the same probability of success p. For simplicity of notation, let a “success” be denoted by 1, and a “failure” by 0. If n = 3, we have

P [{ 000 }] = (1 − p)^3 (26) P [{ 001 }] = (1 − p)^2 p (27) P [{ 110 }] = (1 − p)p^2 , etc. (28)

because of the independence of the Bernoulli sub-experiments. Then the probability of 2 successes is

P [“Two successes”] = P [{ 011 , 101 , 110 }] = 3p^2 (1 − p) (29)

because each of the outcomes in the event has the same probability of p^2 (1 − p). In general, since the number of ways to obtain k successes in n trials is

(n k

, we have the probability of k successes in n independent identical Bernoulli trials as

pn(k) =

n k

pk(1 − p)n−k, k = 0, 1 ,... , n. (30)

This is a very important probability law, called the Binomial probability law.

Example 5: A student did not study for a multiple-choice exam with 10 questions and 4 choices in each question. Find the probability that he answers five or more questions correctly if he randomly picks one of the four choices in each question. We can reasonably assume that P [“correct answer in i-th question”] = 0.25. The probability of getting k answers correct is given by the binomial law as

pn(k) =

k

25 k 0. 7510 −k, k = 0, 1 ,... , 10. (31)

Therefore the probability of passing is

∑^10

k=

k

25 k 0. 7510 −k^ = 0. 0781. (32)

This is rather low, and would become even lower if the number of choices is increased of course.

3.3 Geometric Probability Law

We are often interested in the number of independent Bernoulli trials needed before encountering the first success, for instance:

What is the probability that you win the top lottery prize after fewer than 20 tries if the probability of winning is 10−^6?
What is the probability that we need to re-transmit a packet over the Internet more than once, if the probability of failure of each packet is 10−^3?
How large does the probability of success p have to be in order for you to meet your first success within 5 attempts?

Suppose success in the k-th trial is denoted by Ak. Then the event “First success at the m-th trial” is equivalent to failing the first m − 1 times, and then succeeding in the m-th time, i.e. Ac 1 Ac 2 · · · Acm− 1 Am, where we have dispensed with the intersection notation for convenience. But since the trials are independent and identical sub-experiments, we can write

P [Ac 1 Ac 2 · · · Acm− 1 Am] = P [Am]

m∏− 1

i=

P [Aci ] = p(1 − p)m−^1 (33)

if p is the probability of success in any one of the Bernoulli trials. Hence the probability of the first success in a sequence of Bernoulli trials occurring at the m-th trial is pm = p(1 − p)m−^1 , m = 1, 2 ,... (34)

This is known as the geometric probability law. The probability of the first success occurring later than the N -th trial is given by a simple expression:

P [{N + 1, N + 2,.. .}] =

∑^ ∞

m=N +

p(1 − p)m−^1 (35)

= p

∑^ ∞

k=N

(1 − p)k^ (36)

= p

(1 − p)N 1 − (1 − p)

= (1 − p)N^. (38)

Example: Let p = 10−^6 , and then we can answer the question at the top of this sub-section. The event “win after fewer than 20 tries” is { 1 , 2 ,... , 19 }, which has

Conditional Probability and Independence in Probability Theory, Lecture notes of Statistics

Related documents

Partial preview of the text

Download Conditional Probability and Independence in Probability Theory and more Lecture notes Statistics in PDF only on Docsity!

Weekly Notes for EE2012 2013/14 – Week 3

T. J. Lim

February 4, 2014

1 Conditional Probability

1.1 Definition

P (A|B) =

|A ∩ B|

|B|

P (C|A) =

|A ∩ C|

|A|

P (A|B) =

P (A ∩ B)

P (B)

1.3 Bayes’ Rule

0. 98 × 10 −^5

0. 98 × 10 −^5 + 0. 02 × (1 − 10 −^5 )

= 4. 9 × 10 −^4. (15)

2.1 Concept and Definition

2.2 More Than Two Events

3.2 Bernoulli Trials and the Binomial Probability Law

3.3 Geometric Probability Law

P [{N + 1, N + 2,.. .}] =

∑^ ∞

∑^ ∞