assignment work sheet chapter four, Exercises of Probability and Statistics

assignment work sheet chapter four assignment work sheet chapter four assignment work sheet chapter four

Typology: Exercises

2018/2019

Uploaded on 12/31/2019

maldamotuma
maldamotuma 🇪🇹

4.5

(2)

3 documents

1 / 50

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 4
Conditional Probability
4.1 Discrete Conditional Probability
Conditional Probability
In this section we ask and answer the following question. Suppose we assign a
distribution function to a sample space and then learn that an event Ehas occurred.
How should we change the probabilities of the remaining events? We shall call the
new probability for an event Fthe conditional probability of Fgiven Eand denote
it by P(F|E).
Example 4.1 An experiment consists of rolling a die once. Let Xbe the outcome.
Let Fbe the event {X=6}, and let Ebe the event {X>4}. We assign the
distribution function m(ω)=1/6 for ω=1,2,...,6. Thus, P(F)=1/6. Now
suppose that the die is rolled and we are told that the event Ehas occurred. This
leaves only two possible outcomes: 5 and 6. In the absence of any other information,
we would still regard these outcomes to be equally likely, so the probability of F
becomes 1/2, making P(F|E)=1/2. 2
Example 4.2 In the Life Table (see Appendix C), one finds that in a population
of 100,000 females, 89.835% can expect to live to age 60, while 57.062% can expect
to live to age 80. Given that a woman is 60, what is the probability that she lives
to age 80?
This is an example of a conditional probability. In this case, the original sample
space can be thought of as a set of 100,000 females. The events Eand Fare the
subsets of the sample space consisting of all women who live at least 60 years, and
at least 80 years, respectively. We consider Eto be the new sample space, and note
that Fis a subset of E. Thus, the size of Eis 89,835, and the size of Fis 57,062.
So, the probability in question equals 57,062/89,835 = .6352. Thus, a woman who
is 60 has a 63.52% chance of living to age 80. 2
133
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32

Partial preview of the text

Download assignment work sheet chapter four and more Exercises Probability and Statistics in PDF only on Docsity!

Chapter 4

Conditional Probability

4.1 Discrete Conditional Probability

Conditional Probability

In this section we ask and answer the following question. Suppose we assign a distribution function to a sample space and then learn that an event E has occurred. How should we change the probabilities of the remaining events? We shall call the new probability for an event F the conditional probability of F given E and denote it by P (F |E).

Example 4.1 An experiment consists of rolling a die once. Let X be the outcome. Let F be the event {X = 6}, and let E be the event {X > 4 }. We assign the distribution function m(ω) = 1/6 for ω = 1, 2 ,... , 6. Thus, P (F ) = 1/6. Now suppose that the die is rolled and we are told that the event E has occurred. This leaves only two possible outcomes: 5 and 6. In the absence of any other information, we would still regard these outcomes to be equally likely, so the probability of F becomes 1/2, making P (F |E) = 1/2. 2

Example 4.2 In the Life Table (see Appendix C), one finds that in a population of 100,000 females, 89.835% can expect to live to age 60, while 57.062% can expect to live to age 80. Given that a woman is 60, what is the probability that she lives to age 80? This is an example of a conditional probability. In this case, the original sample space can be thought of as a set of 100,000 females. The events E and F are the subsets of the sample space consisting of all women who live at least 60 years, and at least 80 years, respectively. We consider E to be the new sample space, and note that F is a subset of E. Thus, the size of E is 89,835, and the size of F is 57,062. So, the probability in question equals 57, 062 / 89 ,835 = .6352. Thus, a woman who is 60 has a 63.52% chance of living to age 80. 2

134 CHAPTER 4. CONDITIONAL PROBABILITY

Example 4.3 Consider our voting example from Section 1.2: three candidates A, B, and C are running for office. We decided that A and B have an equal chance of winning and C is only 1/2 as likely to win as A. Let A be the event “A wins,” B that “B wins,” and C that “C wins.” Hence, we assigned probabilities P (A) = 2/5, P (B) = 2/5, and P (C) = 1/5. Suppose that before the election is held, A drops out of the race. As in Exam- ple 4.1, it would be natural to assign new probabilities to the events B and C which are proportional to the original probabilities. Thus, we would have P (B| A) = 2/3, and P (C| A) = 1/3. It is important to note that any time we assign probabilities to real-life events, the resulting distribution is only useful if we take into account all relevant information. In this example, we may have knowledge that most voters who favor A will vote for C if A is no longer in the race. This will clearly make the probability that C wins greater than the value of 1/3 that was assigned above. 2

In these examples we assigned a distribution function and then were given new information that determined a new sample space, consisting of the outcomes that are still possible, and caused us to assign a new distribution function to this space. We want to make formal the procedure carried out in these examples. Let Ω = {ω 1 , ω 2 ,... , ωr } be the original sample space with distribution function m(ωj ) assigned. Suppose we learn that the event E has occurred. We want to assign a new distribution function m(ωj |E) to Ω to reflect this fact. Clearly, if a sample point ωj is not in E, we want m(ωj |E) = 0. Moreover, in the absence of information to the contrary, it is reasonable to assume that the probabilities for ωk in E should have the same relative magnitudes that they had before we learned that E had occurred. For this we require that m(ωk|E) = cm(ωk)

for all ωk in E, with c some positive constant. But we must also have ∑

E

m(ωk|E) = c

E

m(ωk) = 1.

Thus,

c =

∑^1

E m(ωk)

P (E)

(Note that this requires us to assume that P (E) > 0.) Thus, we will define

m(ωk|E) = m(ωk) P (E)

for ωk in E. We will call this new distribution the conditional distribution given E. For a general event F , this gives

P (F |E) =

F ∩E

m(ωk|E) =

F ∩E

m(ωk) P (E)

= P^ (F^ ∩^ E)

P (E)

We call P (F |E) the conditional probability of F occurring given that E occurs, and compute it using the formula

P (F |E) =

P (F ∩ E)

P (E).

136 CHAPTER 4. CONDITIONAL PROBABILITY

(start)

ω^ p^ (ω)

9/

11/

b

w

4/

5/

5/

6/11 I

II

II

I 1/

3/

1/

1/

Color of ball Urn

1

3

2

4

Figure 4.2: Reverse tree diagram.

Bayes Probabilities

Our original tree measure gave us the probabilities for drawing a ball of a given color, given the urn chosen. We have just calculated the inverse probability that a particular urn was chosen, given the color of the ball. Such an inverse probability is called a Bayes probability and may be obtained by a formula that we shall develop later. Bayes probabilities can also be obtained by simply constructing the tree measure for the two-stage experiment carried out in reverse order. We show this tree in Figure 4.2. The paths through the reverse tree are in one-to-one correspondence with those in the forward tree, since they correspond to individual outcomes of the experiment, and so they are assigned the same probabilities. From the forward tree, we find that the probability of a black ball is

1 2 ·^

2 ·^

2 =^

The probabilities for the branches at the second level are found by simple divi- sion. For example, if x is the probability to be assigned to the top branch at the second level, we must have 9 20

· x =

or x = 4/9. Thus, P (I|B) = 4/9, in agreement with our previous calculations. The reverse tree then displays all of the inverse, or Bayes, probabilities.

Example 4.6 We consider now a problem called the Monty Hall problem. This has long been a favorite problem but was revived by a letter from Craig Whitaker to Marilyn vos Savant for consideration in her column in Parade Magazine.^1 Craig wrote: (^1) Marilyn vos Savant, Ask Marilyn, Parade Magazine, 9 September; 2 December; 17 February 1990, reprinted in Marilyn vos Savant, Ask Marilyn, St. Martins, New York, 1992.

4.1. DISCRETE CONDITIONAL PROBABILITY 137

Suppose you’re on Monty Hall’s Let’s Make a Deal! You are given the choice of three doors, behind one door is a car, the others, goats. You pick a door, say 1, Monty opens another door, say 3, which has a goat. Monty says to you “Do you want to pick door 2?” Is it to your advantage to switch your choice of doors?

Marilyn gave a solution concluding that you should switch, and if you do, your probability of winning is 2/3. Several irate readers, some of whom identified them- selves as having a PhD in mathematics, said that this is absurd since after Monty has ruled out one door there are only two possible doors and they should still each have the same probability 1/2 so there is no advantage to switching. Marilyn stuck to her solution and encouraged her readers to simulate the game and draw their own conclusions from this. We also encourage the reader to do this (see Exercise 11). Other readers complained that Marilyn had not described the problem com- pletely. In particular, the way in which certain decisions were made during a play of the game were not specified. This aspect of the problem will be discussed in Sec- tion 4.3. We will assume that the car was put behind a door by rolling a three-sided die which made all three choices equally likely. Monty knows where the car is, and always opens a door with a goat behind it. Finally, we assume that if Monty has a choice of doors (i.e., the contestant has picked the door with the car behind it), he chooses each door with probability 1/2. Marilyn clearly expected her readers to assume that the game was played in this manner. As is the case with most apparent paradoxes, this one can be resolved through careful analysis. We begin by describing a simpler, related question. We say that a contestant is using the “stay” strategy if he picks a door, and, if offered a chance to switch to another door, declines to do so (i.e., he stays with his original choice). Similarly, we say that the contestant is using the “switch” strategy if he picks a door, and, if offered a chance to switch to another door, takes the offer. Now suppose that a contestant decides in advance to play the “stay” strategy. His only action in this case is to pick a door (and decline an invitation to switch, if one is offered). What is the probability that he wins a car? The same question can be asked about the “switch” strategy.

Using the “stay” strategy, a contestant will win the car with probability 1/3, since 1/3 of the time the door he picks will have the car behind it. On the other hand, if a contestant plays the “switch” strategy, then he will win whenever the door he originally picked does not have the car behind it, which happens 2/3 of the time.

This very simple analysis, though correct, does not quite solve the problem that Craig posed. Craig asked for the conditional probability that you win if you switch, given that you have chosen door 1 and that Monty has chosen door 3. To solve this problem, we set up the problem before getting this information and then compute the conditional probability given this information. This is a process that takes place in several stages; the car is put behind a door, the contestant picks a door, and finally Monty opens a door. Thus it is natural to analyze this using a tree measure. Here we make an additional assumption that if Monty has a choice

4.1. DISCRETE CONDITIONAL PROBABILITY 139

1/

1/

1/3 1/

1

Door opened by Monty

Door chosen by contestant

Unconditional Placementof car probability

1

2

1

1 3

3 1/

1/

1/

Conditional probability

1/

2/

Figure 4.4: Conditional probabilities for the Monty Hall problem.

if the contestant chooses the door with the car, so that Monty has a choice of two doors, he chooses each of them with probability 1/2. Now suppose instead that in the case that he has a choice, he chooses the door with the larger number with probability 3/4. In the “switch” vs. “stay” problem, the probability of winning with the “switch” strategy is still 2/3. However, in the original problem, if the contestant switches, he wins with probability 4/7. The reader can check this by noting that the same two paths as before are the only two possible paths in the tree. The path leading to a win, if the contestant switches, has probability 1/3, while the path which leads to a loss, if the contestant switches, has probability 1/4. 2

Independent Events

It often happens that the knowledge that a certain event E has occurred has no effect on the probability that some other event F has occurred, that is, that P (F |E) = P (F ). One would expect that in this case, the equation P (E|F ) = P (E) would also be true. In fact (see Exercise 1), each equation implies the other. If these equations are true, we might say the F is independent of E. For example, you would not expect the knowledge of the outcome of the first toss of a coin to change the probability that you would assign to the possible outcomes of the second toss, that is, you would not expect that the second toss depends on the first. This idea is formalized in the following definition of independent events.

Definition 4.1 Two events E and F are independent if both E and F have positive probability and if P (E|F ) = P (E) ,

and P (F |E) = P (F ). 2

140 CHAPTER 4. CONDITIONAL PROBABILITY

As noted above, if both P (E) and P (F ) are positive, then each of the above equations imply the other, so that to see whether two events are independent, only one of these equations must be checked (see Exercise 1). The following theorem provides another way to check for independence.

Theorem 4.1 If P (E) > 0 and P (F ) > 0, then E and F are independent if and only if P (E ∩ F ) = P (E)P (F ).

Proof. Assume first that E and F are independent. Then P (E|F ) = P (E), and so

P (E ∩ F ) = P (E|F )P (F ) = P (E)P (F ).

Assume next that P (E ∩ F ) = P (E)P (F ). Then

P (E|F ) =

P (E ∩ F )

P (F )

= P (E).

Also,

P (F |E) = P^ (F^ ∩^ E) P (E)

= P (F ).

Therefore, E and F are independent. 2

Example 4.7 Suppose that we have a coin which comes up heads with probability p, and tails with probability q. Now suppose that this coin is tossed twice. Using a frequency interpretation of probability, it is reasonable to assign to the outcome (H, H) the probability p^2 , to the outcome (H, T ) the probability pq, and so on. Let E be the event that heads turns up on the first toss and F the event that tails turns up on the second toss. We will now check that with the above probability assignments, these two events are independent, as expected. We have P (E) = p^2 + pq = p, P (F ) = pq + q^2 = q. Finally P (E ∩ F ) = pq, so P (E ∩ F ) = P (E)P (F ). 2

Example 4.8 It is often, but not always, intuitively clear when two events are independent. In Example 4.7, let A be the event “the first toss is a head” and B the event “the two outcomes are the same.” Then

P (B|A) = P^ (B^ ∩^ A) P (A)

= P^ {HH}

P {HH,HT}

=^1 /^4

=^1

= P (B).

Therefore, A and B are independent, but the result was not so obvious. 2

142 CHAPTER 4. CONDITIONAL PROBABILITY

If we have several random variables X 1 , X 2 ,... , Xn which correspond to a given experiment, then we can consider the joint random variable X¯ = (X 1 , X 2 ,... , Xn) defined by taking an outcome ω of the experiment, and writing, as an n-tuple, the corresponding n outcomes for the random variables X 1 , X 2 ,... , Xn. Thus, if the random variable Xi has, as its set of possible outcomes the set Ri, then the set of possible outcomes of the joint random variable X¯ is the Cartesian product of the Ri’s, i.e., the set of all n-tuples of possible outcomes of the Xi’s.

Example 4.11 (Example 4.10 continued) In the coin-tossing example above, let Xi denote the outcome of the ith toss. Then the joint random variable X¯ = (X 1 , X 2 , X 3 ) has eight possible outcomes. Suppose that we now define Yi, for i = 1, 2 , 3, as the number of heads which occur in the first i tosses. Then Yi has { 0 , 1 ,... , i} as possible outcomes, so at first glance, the set of possible outcomes of the joint random variable Y¯ = (Y 1 , Y 2 , Y 3 ) should be the set

{(a 1 , a 2 , a 3 ) : 0 ≤ a 1 ≤ 1 , 0 ≤ a 2 ≤ 2 , 0 ≤ a 3 ≤ 3 }.

However, the outcome (1, 0 , 1) cannot occur, since we must have a 1 ≤ a 2 ≤ a 3. The solution to this problem is to define the probability of the outcome (1, 0 , 1) to be 0. We now illustrate the assignment of probabilities to the various outcomes for the joint random variables X¯ and Y¯. In the first case, each of the eight outcomes should be assigned the probability 1/8, since we are assuming that we have a fair coin. In the second case, since Yi has i + 1 possible outcomes, the set of possible outcomes has size 24. Only eight of these 24 outcomes can actually occur, namely the ones satisfying a 1 ≤ a 2 ≤ a 3. Each of these outcomes corresponds to exactly one of the outcomes of the random variable X¯, so it is natural to assign probability 1/8 to each of these. We assign probability 0 to the other 16 outcomes. In each case, the probability function is called a joint distribution function. 2

We collect the above ideas in a definition.

Definition 4.3 Let X 1 , X 2 ,... , Xn be random variables associated with an exper- iment. Suppose that the sample space (i.e., the set of possible outcomes) of Xi is the set Ri. Then the joint random variable X¯ = (X 1 , X 2 ,... , Xn) is defined to be the random variable whose outcomes consist of ordered n-tuples of outcomes, with the ith coordinate lying in the set Ri. The sample space Ω of X¯ is the Cartesian product of the Ri’s: Ω = R 1 × R 1 × · · · × Rn.

The joint distribution function of X¯ is the function which gives the probability of each of the outcomes of X¯. 2

Example 4.12 (Example 4.10 continued) We now consider the assignment of prob- abilities in the above example. In the case of the random variable X¯, the probabil- ity of any outcome (a 1 , a 2 , a 3 ) is just the product of the probabilities P (Xi = ai),

4.1. DISCRETE CONDITIONAL PROBABILITY 143

Not smoke Smoke Total Not cancer 40 10 50 Cancer 7 3 10 Totals 47 13 60

Table 4.1: Smoking and cancer.

S 0 1 0 40/60 10/ C 1 7/60 3/

Table 4.2: Joint distribution.

for i = 1, 2 , 3. However, in the case of Y¯ , the probability assigned to the outcome (1, 1 , 0) is not the product of the probabilities P (Y 1 = 1), P (Y 2 = 1), and P (Y 3 = 0). The difference between these two situations is that the value of Xi does not affect the value of Xj , if i 6 = j, while the values of Yi and Yj affect one another. For example, if Y 1 = 1, then Y 2 cannot equal 0. This prompts the next definition. 2

Definition 4.4 The random variables X 1 , X 2 ,... , Xn are mutually independent if

P (X 1 = r 1 , X 2 = r 2 ,... , Xn = rn) = P (X 1 = r 1 )P (X 2 = r 2 ) · · · P (Xn = rn)

for any choice of r 1 , r 2 ,... , rn. Thus, if X 1 , X 2 ,... , Xn are mutually independent, then the joint distribution function of the random variable

X¯ = (X 1 , X 2 ,... , Xn)

is just the product of the individual distribution functions. When two random variables are mutually independent, we shall say more briefly that they are indepen- dent. 2

Example 4.13 In a group of 60 people, the numbers who do or do not smoke and do or do not have cancer are reported as shown in Table 4.1. Let Ω be the sample space consisting of these 60 people. A person is chosen at random from the group. Let C(ω) = 1 if this person has cancer and 0 if not, and S(ω) = 1 if this person smokes and 0 if not. Then the joint distribution of {C, S} is given in Table 4.2. For example P (C = 0, S = 0) = 40/60, P (C = 0, S = 1) = 10/60, and so forth. The distributions of the individual random variables are called marginal distributions. The marginal distributions of C and S are:

pC =

4.1. DISCRETE CONDITIONAL PROBABILITY 145

The sample space is R^3 = R × R × R with R = { 1 , 2 , 3 , 4 , 5 , 6 }. If ω = (1, 3 , 6), then X 1 (ω) = 1, X 2 (ω) = 3, and X 3 (ω) = 6 indicating that the first roll was a 1, the second was a 3, and the third was a 6. The probability assigned to any sample point is

m(ω) =

Example 4.15 Consider next a Bernoulli trials process with probability p for suc- cess on each experiment. Let Xj (ω) = 1 if the jth outcome is success and Xj (ω) = 0 if it is a failure. Then X 1 , X 2 ,... , Xn is an independent trials process. Each Xj has the same distribution function

mj =

q p

where q = 1 − p. If Sn = X 1 + X 2 + · · · + Xn, then

P (Sn = j) =

n j

pj^ qn−j^ ,

and Sn has, as distribution, the binomial distribution b(n, p, j). 2

Bayes’ Formula

In our examples, we have considered conditional probabilities of the following form: Given the outcome of the second stage of a two-stage experiment, find the proba- bility for an outcome at the first stage. We have remarked that these probabilities are called Bayes probabilities. We return now to the calculation of more general Bayes probabilities. Suppose we have a set of events H 1 , H 2 ,... , Hm that are pairwise disjoint and such that

Ω = H 1 ∪ H 2 ∪ · · · ∪ Hm.

We call these events hypotheses. We also have an event E that gives us some information about which hypothesis is correct. We call this event evidence. Before we receive the evidence, then, we have a set of prior probabilities P (H 1 ), P (H 2 ),... , P (Hm) for the hypotheses. If we know the correct hypothesis, we know the probability for the evidence. That is, we know P (E|Hi) for all i. We want to find the probabilities for the hypotheses given the evidence. That is, we want to find the conditional probabilities P (Hi|E). These probabilities are called the posterior probabilities. To find these probabilities, we write them in the form

P (Hi|E) =

P (Hi ∩ E) P (E)

146 CHAPTER 4. CONDITIONAL PROBABILITY

Number having The results Disease this disease + + + – – + – – d 1 3215 2110 301 704 100 d 2 2125 396 132 1187 410 d 3 4660 510 3568 73 509 Total 10000

Table 4.3: Diseases data.

We can calculate the numerator from our given information by

P (Hi ∩ E) = P (Hi)P (E|Hi). (4.2)

Since one and only one of the events H 1 , H 2 ,... , Hm can occur, we can write the probability of E as

P (E) = P (H 1 ∩ E) + P (H 2 ∩ E) + · · · + P (Hm ∩ E).

Using Equation 4.2, the above expression can be seen to equal

P (H 1 )P (E|H 1 ) + P (H 2 )P (E|H 2 ) + · · · + P (Hm)P (E|Hm). (4.3)

Using (4.1), (4.2), and (4.3) yields Bayes’ formula:

P (Hi|E) = ∑mP^ (Hi)P^ (E|Hi) k=1 P^ (Hk)P^ (E|Hk)^

Although this is a very famous formula, we will rarely use it. If the number of hypotheses is small, a simple tree measure calculation is easily carried out, as we have done in our examples. If the number of hypotheses is large, then we should use a computer. Bayes probabilities are particularly appropriate for medical diagnosis. A doctor is anxious to know which of several diseases a patient might have. She collects evidence in the form of the outcomes of certain tests. From statistical studies the doctor can find the prior probabilities of the various diseases before the tests, and the probabilities for specific test outcomes, given a particular disease. What the doctor wants to know is the posterior probability for the particular disease, given the outcomes of the tests.

Example 4.16 A doctor is trying to decide if a patient has one of three diseases d 1 , d 2 , or d 3. Two tests are to be carried out, each of which results in a positive (+) or a negative (−) outcome. There are four possible test patterns ++, +−, −+, and −−. National records have indicated that, for 10,000 people having one of these three diseases, the distribution of diseases and test results are as in Table 4.3. From this data, we can estimate the prior probabilities for each of the diseases and, given a particular disease, the probability of a particular test outcome. For example, the prior probability of disease d 1 may be estimated to be 3215/ 10 ,000 = .3215. The probability of the test result +−, given disease d 1 , may be estimated to be 301/3125 = .094.

148 CHAPTER 4. CONDITIONAL PROBABILITY

.001 (^) can

not

.

.

.05 (^) +

.

0

.

.

.

.

.

1

(^0) can

not

.

.

0

.

can

not

.

Original Tree (^) Reverse Tree

.

.

Figure 4.5: Forward and reverse tree diagrams.

Three gamblers, A, B and C, take 12 balls of which 4 are white and 8 black. They play with the rules that the drawer is blindfolded, A is to draw first, then B and then C, the winner to be the one who first draws a white ball. What is the ratio of their chances?^2

From his answer it is clear that Huygens meant that each ball is replaced after drawing. However, John Hudde, the mayor of Amsterdam, assumed that he meant to sample without replacement and corresponded with Huygens about the difference in their answers. Hacking remarks that “Neither party can understand what the other is doing.”^3 By the time of de Moivre’s book, The Doctrine of Chances, these distinctions were well understood. De Moivre defined independence and dependence as follows:

Two Events are independent, when they have no connexion one with the other, and that the happening of one neither forwards nor obstructs the happening of the other. Two Events are dependent, when they are so connected together as that the Probability of either’s happening is altered by the happening of the other.^4

De Moivre used sampling with and without replacement to illustrate that the probability that two independent events both happen is the product of their prob- abilities, and for dependent events that:

(^2) Quoted in F. N. David, Games, Gods and Gambling (London: Griffin, 1962), p. 119. (^3) I. Hacking, The Emergence of Probability (Cambridge: Cambridge University Press, 1975), p. 99. 4 A. de Moivre, The Doctrine of Chances, 3rd ed. (New York: Chelsea, 1967), p. 6.

4.1. DISCRETE CONDITIONAL PROBABILITY 149

The Probability of the happening of two Events dependent, is the prod- uct of the Probability of the happening of one of them, by the Probability which the other will have of happening, when the first is considered as having happened; and the same Rule will extend to the happening of as many Events as may be assigned.^5

The formula that we call Bayes’ formula, and the idea of computing the proba- bility of a hypothesis given evidence, originated in a famous essay of Thomas Bayes. Bayes was an ordained minister in Tunbridge Wells near London. His mathemat- ical interests led him to be elected to the Royal Society in 1742, but none of his results were published within his lifetime. The work upon which his fame rests, “An Essay Toward Solving a Problem in the Doctrine of Chances,” was published in 1763, three years after his death.^6 Bayes reviewed some of the basic concepts of probability and then considered a new kind of inverse probability problem requiring the use of conditional probability. Bernoulli, in his study of processes that we now call Bernoulli trials, had proven his famous law of large numbers which we will study in Chapter 8. This theorem assured the experimenter that if he knew the probability p for success, he could predict that the proportion of successes would approach this value as he increased the number of experiments. Bernoulli himself realized that in most interesting cases you do not know the value of p and saw his theorem as an important step in showing that you could determine p by experimentation. To study this problem further, Bayes started by assuming that the probability p for success is itself determined by a random experiment. He assumed in fact that this experiment was such that this value for p is equally likely to be any value between 0 and 1. Without knowing this value we carry out n experiments and observe m successes. Bayes proposed the problem of finding the conditional probability that the unknown probability p lies between a and b. He obtained the answer:

P (a ≤ p < b|m successes in n trials) =

∫ (^) b a x

m(1 − x)n−m (^) dx ∫ (^1) 0 xm(1^ −^ x)n−m^ dx

We shall see in the next section how this result is obtained. Bayes clearly wanted to show that the conditional distribution function, given the outcomes of more and more experiments, becomes concentrated around the true value of p. Thus, Bayes was trying to solve an inverse problem. The computation of the integrals was too difficult for exact solution except for small values of j and n, and so Bayes tried approximate methods. His methods were not very satisfactory and it has been suggested that this discouraged him from publishing his results. However, his paper was the first in a series of important studies carried out by Laplace, Gauss, and other great mathematicians to solve inverse problems. They studied this problem in terms of errors in measurements in astronomy. If an as- tronomer were to know the true value of a distance and the nature of the random (^5) ibid, p. 7. (^6) T. Bayes, “An Essay Toward Solving a Problem in the Doctrine of Chances,” Phil. Trans. Royal Soc. London, vol. 53 (1763), pp. 370–418.

4.1. DISCRETE CONDITIONAL PROBABILITY 151

(a) Which of the following pairs of these events are independent? (1) A, B (2) A, D (3) A, E (4) D, E (b) Which of the following triples of these events are independent? (1) A, B, C (2) A, B, D (3) C, D, E

6 From a deck of five cards numbered 2, 4, 6, 8, and 10, respectively, a card is drawn at random and replaced. This is done three times. What is the probability that the card numbered 2 was drawn exactly two times, given that the sum of the numbers on the three draws is 12?

7 A coin is tossed twice. Consider the following events. A: Heads on the first toss. B: Heads on the second toss. C: The two tosses come out the same.

(a) Show that A, B, C are pairwise independent but not independent. (b) Show that C is independent of A and B but not of A ∩ B.

8 Let Ω = {a, b, c, d, e, f }. Assume that m(a) = m(b) = 1/8 and m(c) = m(d) = m(e) = m(f ) = 3/16. Let A, B, and C be the events A = {d, e, a}, B = {c, e, a}, C = {c, d, a}. Show that P (A ∩ B ∩ C) = P (A)P (B)P (C) but no two of these events are independent.

9 What is the probability that a family of two children has

(a) two boys given that it has at least one boy? (b) two boys given that the first child is a boy?

10 In Example 4.2, we used the Life Table (see Appendix C) to compute a con- ditional probability. The number 93,753 in the table, corresponding to 40- year-old males, means that of all the males born in the United States in 1950, 93.753% were alive in 1990. Is it reasonable to use this as an estimate for the probability of a male, born this year, surviving to age 40?

11 Simulate the Monty Hall problem. Carefully state any assumptions that you have made when writing the program. Which version of the problem do you think that you are simulating?

12 In Example 4.17, how large must the prior probability of cancer be to give a posterior probability of .5 for cancer given a positive test?

13 Two cards are drawn from a bridge deck. What is the probability that the second card drawn is red?

152 CHAPTER 4. CONDITIONAL PROBABILITY

14 If P ( B˜) = 1/4 and P (A|B) = 1/2, what is P (A ∩ B)?

15 (a) What is the probability that your bridge partner has exactly two aces, given that she has at least one ace? (b) What is the probability that your bridge partner has exactly two aces, given that she has the ace of spades?

16 Prove that for any three events A, B, C, each having positive probability,

P (A ∩ B ∩ C) = P (A)P (B|A)P (C|A ∩ B).

17 Prove that if A and B are independent so are

(a) A and B˜. (b) A˜ and B˜.

18 A doctor assumes that a patient has one of three diseases d 1 , d 2 , or d 3. Before any test, he assumes an equal probability for each disease. He carries out a test that will be positive with probability .8 if the patient has d 1 , .6 if he has disease d 2 , and .4 if he has disease d 3. Given that the outcome of the test was positive, what probabilities should the doctor now assign to the three possible diseases?

19 In a poker hand, John has a very strong hand and bets 5 dollars. The prob- ability that Mary has a better hand is .04. If Mary had a better hand she would raise with probability .9, but with a poorer hand she would only raise with probability .1. If Mary raises, what is the probability that she has a better hand than John does?

20 The Polya urn model for contagion is as follows: We start with an urn which contains one white ball and one black ball. At each second we choose a ball at random from the urn and replace this ball and add one more of the color chosen. Write a program to simulate this model, and see if you can make any predictions about the proportion of white balls in the urn after a large number of draws. Is there a tendency to have a large fraction of balls of the same color in the long run?

21 It is desired to find the probability that in a bridge deal each player receives an ace. A student argues as follows. It does not matter where the first ace goes. The second ace must go to one of the other three players and this occurs with probability 3/4. Then the next must go to one of two, an event of probability 1/2, and finally the last ace must go to the player who does not have an ace. This occurs with probability 1/4. The probability that all these events occur is the product (3/4)(1/2)(1/4) = 3/32. Is this argument correct?

22 One coin in a collection of 65 has two heads. The rest are fair. If a coin, chosen at random from the lot and then tossed, turns up heads 6 times in a row, what is the probability that it is the two-headed coin?