

































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Based on the theories of mathematics, we can expand our knowledge of statistics
Typology: Lecture notes
1 / 73
This page cannot be seen from the preview
Don't miss anything!


































































5.1 Introduction 5.2 Bivariate and Multivariate Probability Distributions 5.3 Marginal and Conditional Probability Distributions 5.4 Independent Random Variables 5.5 The Expected Value of a Function of Random Variables 5.6 Special Theorems 5.7 The Covariance of Two Random Variables 5.8 The Expected Value and Variance of Linear Functions of Random Variables 5.9 The Multinomial Probability Distribution 5.10 The Bivariate Normal Distribution (Optional) 5.11 Conditional Expectations 5.12 Summary References and Further Readings
The intersection of two or more events is frequently of interest to an experimenter. For example, a gambler playing blackjack is interested in the event of drawing both an ace and a face card from a 52-card deck. A biologist, observing the number of animals surviving in a litter, is concerned about the intersection of these events: A: The litter contains n animals. B: y animals survive. Similarly, observing both the height and the weight of an individual represents the intersection of a specific pair of events associated with height–weight measurements.
223
224 Chapter 5 Multivariate Probability Distributions
Most important to statisticians are intersections that occur in the course of sam- pling. Suppose that Y 1 , Y 2 ,... , Y (^) n denote the outcomes of n successive trials of an experiment. For example, this sequence could represent the weights of n people or the measurements of n physical characteristics for a single person. A specific set of outcomes, or sample measurements, may be expressed in terms of the intersection of the n events ( Y 1 = y 1 ), ( Y 2 = y 2 ),... , ( Yn = yn ), which we will denote as ( Y 1 = y 1 , Y 2 = y 2 ,... , Y (^) n = yn ), or, more compactly, as ( y 1 , y 2 ,... , yn ). Calculation of the probability of this intersection is essential in making inferences about the population from which the sample was drawn and is a major reason for studying multivariate probability distributions.
Many random variables can be defined over the same sample space. For example, consider the experiment of tossing a pair of dice. The sample space contains 36 sample points, corresponding to the mn = (6)(6) = 36 ways in which numbers may appear on the faces of the dice. Any one of the following random variables could be defined over the sample space and might be of interest to the experimenter: Y 1 : The number of dots appearing on die 1. Y 2 : The number of dots appearing on die 2. Y 3 : The sum of the number of dots on the dice. Y 4 : The product of the number of dots appearing on the dice. The 36 sample points associated with the experiment are equiprobable and corre- spond to the 36 numerical events ( y 1 , y 2 ). Thus, throwing a pair of 1s is the simple event ( 1 , 1 ). Throwing a 2 on die 1 and a 3 on die 2 is the simple event ( 2 , 3 ). Because all pairs ( y 1 , y 2 ) occur with the same relative frequency, we assign probability 1/ 36 to each sample point. For this simple example, the intersection ( y 1 , y 2 ) contains at most one sample point. Hence, the bivariate probability function is
p ( y 1 , y 2 ) = P ( Y 1 = y 1 , Y 2 = y 2 ) = 1 / 36 , y 1 = 1 , 2 ,... , 6 , y 2 = 1 , 2 ,... , 6.
A graph of the bivariate probability function for the die-tossing experiment is shown in Figure 5.1. Notice that a nonzero probability is assigned to a point ( y 1 , y 2 ) in the plane if and only if y 1 = 1 , 2 ,... , 6 and y 2 = 1 , 2 ,... , 6. Thus, exactly 36 points in the plane are assigned nonzero probabilities. Further, the probabilities are assigned in such a way that the sum of the nonzero probabilities is equal to 1. In Figure 5.1 the points assigned nonzero probabilities are represented in the ( y 1 , y 2 ) plane, whereas the probabilities associated with these points are given by the lengths of the lines above them. Figure 5.1 may be viewed as a theoretical, three-dimensional relative frequency histogram for the pairs of observations ( y 1 , y 2 ). As in the single- variable discrete case, the theoretical histogram provides a model for the sample histogram that would be obtained if the die-tossing experiment were repeated a large number of times.
226 Chapter 5 Multivariate Probability Distributions
straightforward. For the die-tossing experiment, P ( 2 ≤ Y 1 ≤ 3 , 1 ≤ Y 2 ≤ 2 ) is
P ( 2 ≤ Y 1 ≤ 3 , 1 ≤ Y 2 ≤ 2 ) = p ( 2 , 1 ) + p ( 2 , 2 ) + p ( 3 , 1 ) + p ( 3 , 2 ) = 4 / 36 = 1 / 9.
EXAMPLE 5.1 A local supermarket has three checkout counters. Two customers arrive at the counters at different times when the counters are serving no other customers. Each customer chooses a counter at random, independently of the other. Let Y 1 denote the number of customers who choose counter 1 and Y 2 , the number who select counter 2. Find the joint probability function of Y 1 and Y 2.
Solution We might proceed with the derivation in many ways. The most direct is to consider the sample space associated with the experiment. Let the pair { i , j } denote the simple event that the first customer chose counter i and the second customer chose counter j , where i , j = 1 , 2, and 3. Using the mn rule, the sample space consists of 3 × 3 = 9 sample points. Under the assumptions given earlier, each sample point is equally likely and has probability 1/9. The sample space associated with the experiment is
Notice that sample point { 1 , 1 } is the only sample point corresponding to ( Y 1 = 2 , Y 2 = 0 ) and hence P ( Y 1 = 2 , Y 2 = 0 ) = 1 /9. Similarly, P ( Y 1 = 1 , Y 2 = 1 ) = P ({ 1 , 2 } or { 2 , 1 }) = 2 /9. Table 5.1 contains the probabilities associated with each possible pair of values for Y 1 and Y 2 —that is, the joint probability function for Y 1 and Y 2. As always, the results of Theorem 5.1 hold for this example.
Table 5.1 Probability function for Y 1 and Y 2 , Example 5. y 1 y 2 0 1 2 0 1 / 9 2 / 9 1 / 9 1 2 / 9 2 / 9 0 2 1 / 9 0 0
As in the case of univariate random variables, the distinction between jointly discrete and jointly continuous random variables may be characterized in terms of their (joint) distribution functions.
DEFINITION 5.2 For any random variables^ Y 1 and^ Y 2 , the joint (bivariate) distribution function F ( y 1 , y 2 ) is F ( y 1 , y 2 ) = P ( Y 1 ≤ y 1 , Y 2 ≤ y 2 ), −∞ < y 1 < ∞, −∞ < y 2 < ∞.
5.2 Bivariate and Multivariate Probability Distributions 227
For two discrete variables Y 1 and Y 2 , F ( y 1 , y 2 ) is given by
F ( y 1 , y 2 ) =
t 1 ≤ y 1
t 2 ≤ y 2
p ( t 1 , t 2 ).
For the die-tossing experiment,
F ( 2 , 3 ) = P ( Y 1 ≤ 2 , Y 2 ≤ 3 ) = p ( 1 , 1 ) + p ( 1 , 2 ) + p ( 1 , 3 ) + p ( 2 , 1 ) + p ( 2 , 2 ) + p ( 2 , 3 ).
Because p ( y 1 , y 2 ) = 1 /36 for all pairs of values of y 1 and y 2 under consideration, F ( 2 , 3 ) = 6 / 36 = 1 /6.
EXAMPLE 5.2 Consider the random variables Y 1 and Y 2 of Example 5.1. Find F (− 1 , 2 ), F ( 1. 5 , 2 ), and F ( 5 , 7 ).
Solution Using the results in Table 5.1, we see that
F (− 1 , 2 ) = P ( Y 1 ≤ − 1 , Y 2 ≤ 2 ) = P (∅) = 0.
Further,
F ( 1. 5 , 2 ) = P ( Y 1 ≤ 1. 5 , Y 2 ≤ 2 ) = p ( 0 , 0 ) + p ( 0 , 1 ) + p ( 0 , 2 ) + p ( 1 , 0 ) + p ( 1 , 1 ) + p ( 1 , 2 ) = 8 / 9.
Similarly,
F ( 5 , 7 ) = P ( Y 1 ≤ 5 , Y 2 ≤ 7 ) = 1.
Notice that F ( y 1 , y 2 ) = 1 for all y 1 , y 2 such that min{ y 1 , y 2 } ≥ 2. Also, F ( y 1 , y 2 ) = 0 if min{ y 1 , y 2 ) < 0.
Two random variables are said to be jointly continuous if their joint distribution function F ( y 1 , y 2 ) is continuous in both arguments.
DEFINITION 5.3 Let Y 1 and Y 2 be continuous random variables with joint distribution function F ( y 1 , y 2 ). If there exists a nonnegative function f ( y 1 , y 2 ), such that
F ( y 1 , y 2 ) =
∫ (^) y 1
−∞
∫ (^) y 2
−∞
f ( t 1 , t 2 ) dt 2 dt 1 ,
for all −∞ < y 1 < ∞, −∞ < y 2 < ∞, then Y 1 and Y 2 are said to be jointly continuous random variables. The function f ( y 1 , y 2 ) is called the joint prob- ability density function.
Bivariate cumulative distribution functions satisfy a set of properties similar to those specified for univariate cumulative distribution functions.
5.2 Bivariate and Multivariate Probability Distributions 229
Volumes under this surface correspond to probabilities. Thus, P ( a 1 ≤ Y 1 ≤ a 2 , b 1 ≤ Y 2 ≤ b 2 ) is the shaded volume shown in Figure 5.2 and is equal to ∫ (^) b 2
b 1
∫ (^) a 2
a 1
f ( y 1 , y 2 ) dy 1 dy 2.
EXAMPLE 5.3 Suppose that a radioactive particle is randomly located in a square with sides of unit length. That is, if two regions within the unit square and of equal area are considered, the particle is equally likely to be in either region. Let Y 1 and Y 2 denote the coordinates of the particle’s location. A reasonable model for the relative frequency histogram for Y 1 and Y 2 is the bivariate analogue of the univariate uniform density function:
f ( y 1 , y 2 ) =
{ (^1) , 0 ≤ y 1 ≤^1 ,^0 ≤^ y 2 ≤^1 , 0 , elsewhere. a Sketch the probability density surface. b Find F (. 2 ,. 4 ). c Find P (. 1 ≤ Y 1 ≤. 3 , 0 ≤ Y 2 ≤. 5 ).
Solution a The sketch is shown in Figure 5.3.
b F (. 2 ,. 4 ) =
−∞
−∞
f ( y 1 , y 2 ) dy 1 dy 2
0
0
( 1 ) dy 1 dy 2
0
y 1
0
dy 2 =
0
. 2 dy 2 =. 08.
The probability F (. 2 ,. 4 ) corresponds to the volume under f ( y 1 , y 2 ) = 1, which is shaded in Figure 5.3. As geometric considerations indicate, the desired probability (volume) is equal to .08, which we obtained through integration at the beginning of this part.
f ( y 1 , y 2 )
y 1
y 2
1
1
1
0 .
.
F (.2, .4)
F I G U R E 5. Geometric representation of f ( y 1 , y 2 ), Example 5.
230 Chapter 5 Multivariate Probability Distributions
c P (. 1 ≤ Y 1 ≤. 3 , 0 ≤ Y 2 ≤. 5 ) =
0
. 1
f ( y 1 , y 2 ) dy 1 dy 2
0
. 1
1 dy 1 dy 2 =. 10.
This probability corresponds to the volume under the density function f ( y 1 , y 2 ) = 1 that is above the region. 1 ≤ y 1 ≤. 3 , 0 ≤ y 2 ≤ .5. Like the solution in part (b), the current solution can be obtained by using elementary ge- ometric concepts. The density or height of the surface is equal to 1, and hence the desired probability (volume) is P (. 1 ≤ Y 1 ≤. 3 , 0 ≤ Y 2 ≤. 5 ) = (. 2 )(. 5 )( 1 ) =. 10.
A slightly more complicated bivariate model is illustrated in the following example.
EXAMPLE 5.4 Gasoline is to be stocked in a bulk tank once at the beginning of each week and then sold to individual customers. Let Y 1 denote the proportion of the capacity of the bulk tank that is available after the tank is stocked at the beginning of the week. Because of the limited supplies, Y 1 varies from week to week. Let Y 2 denote the proportion of the capacity of the bulk tank that is sold during the week. Because Y 1 and Y 2 are both proportions, both variables take on values between 0 and 1. Further, the amount sold, y 2 , cannot exceed the amount available, y 1. Suppose that the joint density function for Y 1 and Y 2 is given by
f ( y 1 , y 2 ) =
{ (^3) y 1 ,^0 ≤^ y 2 ≤^ y 1 ≤^1 , 0 , elsewhere. A sketch of this function is given in Figure 5.4. Find the probability that less than one-half of the tank will be stocked and more than one-quarter of the tank will be sold.
Solution We want to find P ( 0 ≤ Y 1 ≤. 5 , Y 2 >. 25 ). For any continuous random variable, the probability of observing a value in a region is the volume under the density function above the region of interest. The density function f ( y 1 , y 2 ) is positive only in the
f ( y 1 , y 2 )
y 1
y 2
0
3
1
1
F I G U R E 5. The joint density function for Example 5.
232 Chapter 5 Multivariate Probability Distributions
distributions of the populations of joint observations ( y 1 , y 2 ,... , yn ) for the discrete case and the continuous case, respectively. In the continuous case, P ( Y 1 ≤ y 1 , Y 2 ≤ y 2 ,... , Y (^) n ≤ yn ) = F ( y 1 ,... , yn )
=
∫ (^) y 1
−∞
∫ (^) y 2
−∞
∫ (^) yn
−∞
f ( t 1 , t 2 ,... , t (^) n ) dt (^) n... dt 1
for every set of real numbers ( y 1 , y 2 ,... , yn ). Multivariate distribution functions de- fined by this equality satisfy properties similar to those specified for the bivariate case.
5.1 Contracts for two construction jobs are randomly assigned to one or more of three firms, A, B, and C. Let Y 1 denote the number of contracts assigned to firm A and Y 2 the number of contracts assigned to firm B. Recall that each firm can receive 0, 1, or 2 contracts. a Find the joint probability function for Y 1 and Y 2. b Find F ( 1 , 0 ). 5.2 Three balanced coins are tossed independently. One of the variables of interest is Y 1 , the number of heads. Let Y 2 denote the amount of money won on a side bet in the following manner. If the first head occurs on the first toss, you win $1. If the first head occurs on toss 2 or on toss 3 you win $2 or $3, respectively. If no heads appear, you lose $1 (that is, win −$1). a Find the joint probability function for Y 1 and Y 2. b What is the probability that fewer than three heads will occur and you will win $1 or less? [That is, find F ( 2 , 1 ).] 5.3 Of nine executives in a business firm, four are married, three have never married, and two are divorced. Three of the executives are to be selected for promotion. Let Y 1 denote the number of married executives and Y 2 denote the number of never-married executives among the three selected for promotion. Assuming that the three are randomly selected from the nine available, find the joint probability function of Y 1 and Y 2. 5.4 Given here is the joint probability function associated with data obtained in a study of auto- mobile accidents in which a child (under age 5 years) was in the car and at least one fatality occurred. Specifically, the study focused on whether or not the child survived and what type of seatbelt (if any) he or she used. Define
Y 1 =
{ 0 , if the child survived, 1 , if not,
and Y 2 =
0 , if no belt used, 1 , if adult belt used, 2 , if car-seat belt used. Notice that Y 1 is the number of fatalities per child and, since children’s car seats usually utilize two belts, Y 2 is the number of seatbelts in use at the time of the accident. y 1 y 2 0 1 Total 0 .38 .17. 1 .14 .02. 2 .24 .05. Total .76 .24 1.
Exercises 233
a Verify that the preceding probability function satisfies Theorem 5.1. b Find F ( 1 , 2 ). What is the interpretation of this value? 5.5 Refer to Example 5.4. The joint density of Y 1 , the proportion of the capacity of the tank that is stocked at the beginning of the week, and Y 2 , the proportion of the capacity sold during the week, is given by
f ( y 1 , y 2 ) =
{ 3 y 1 , 0 ≤ y 2 ≤ y 1 ≤ 1 , 0 , elsewhere.
a Find F ( 1 / 2 , 1 / 3 ) = P ( Y 1 ≤ 1 / 2 , Y 2 ≤ 1 / 3 ). b Find P ( Y 2 ≤ Y 1 / 2 ), the probability that the amount sold is less than half the amount purchased. 5.6 Refer to Example 5.3. If a radioactive particle is randomly located in a square of unit length, a reasonable model for the joint density function for Y 1 and Y 2 is
f ( y 1 , y 2 ) =
{ 1 , 0 ≤ y 1 ≤ 1 , 0 ≤ y 2 ≤ 1 , 0 , elsewhere.
a What is P ( Y 1 − Y 2 >. 5 )? b What is P ( Y 1 Y 2 <. 5 )? 5.7 Let Y 1 and Y 2 have joint density function
f ( y 1 , y 2 ) =
{ e −( y^1 + y^2 )^ , y 1 > 0 , y 2 > 0 , 0 , elsewhere.
a What is P ( Y 1 < 1 , Y 2 > 5 )? b What is P ( Y 1 + Y 2 < 3 )? 5.8 Let Y 1 and Y 2 have the joint probability density function given by
f ( y 1 , y 2 ) =
{ ky 1 y 2 , 0 ≤ y 1 ≤ 1 , 0 ≤ y 2 ≤ 1 , 0 , elsewhere.
a Find the value of k that makes this a probability density function. b Find the joint distribution function for Y 1 and Y 2. c Find P ( Y 1 ≤ 1 / 2 , Y 2 ≤ 3 / 4 ). 5.9 Let Y 1 and Y 2 have the joint probability density function given by
f ( y 1 , y 2 ) =
{ k ( 1 − y 2 ), 0 ≤ y 1 ≤ y 2 ≤ 1, 0 , elsewhere. a Find the value of k that makes this a probability density function. b Find P ( Y 1 ≤ 3 / 4 , Y 2 ≥ 1 / 2 ).
5.10 An environmental engineer measures the amount (by weight) of particulate pollution in air samples of a certain volume collected over two smokestacks at a coal-operated power plant. One of the stacks is equipped with a cleaning device. Let Y 1 denote the amount of pollutant per sample collected above the stack that has no cleaning device and let Y 2 denote the amount of pollutant per sample collected above the stack that is equipped with the cleaning device.
5.3 Marginal and Conditional Probability Distributions 235
5.15 The management at a fast-food outlet is interested in the joint behavior of the random variables Y 1 , defined as the total time between a customer’s arrival at the store and departure from the service window, and Y 2 , the time a customer waits in line before reaching the service window. Because Y 1 includes the time a customer waits in line, we must have Y 1 ≥ Y 2. The relative frequency distribution of observed values of Y 1 and Y 2 can be modeled by the probability density function
f ( y 1 , y 2 ) =
{ e − y^1 , 0 ≤ y 2 ≤ y 1 < ∞, 0 , elsewhere with time measured in minutes. Find a P ( Y 1 < 2 , Y 2 > 1 ). b P ( Y 1 ≥ 2 Y 2 ). c P ( Y 1 − Y 2 ≥ 1 ). (Notice that Y 1 − Y 2 denotes the time spent at the service window.) 5.16 Let Y 1 and Y 2 denote the proportions of time (out of one workday) during which employees I and II, respectively, perform their assigned tasks. The joint relative frequency behavior of Y 1 and Y 2 is modeled by the density function
f ( y 1 , y 2 ) =
{ y 1 + y 2 , 0 ≤ y 1 ≤ 1 , 0 ≤ y 2 ≤ 1, 0 , elsewhere. a Find P ( Y 1 < 1 / 2 , Y 2 > 1 / 4 ). b Find P ( Y 1 + Y 2 ≤ 1 ). 5.17 Let ( Y 1 , Y 2 ) denote the coordinates of a point chosen at random inside a unit circle whose center is at the origin. That is, Y 1 and Y 2 have a joint density function given by
f ( y 1 , y 2 ) =
1 π , y^21 + y 22 ≤ 1, 0 , elsewhere. Find P ( Y 1 ≤ Y 2 ). 5.18 An electronic system has one each of two different types of components in joint operation. Let Y 1 and Y 2 denote the random lengths of life of the components of type I and type II, respectively. The joint density function is given by
f ( y 1 , y 2 ) =
{ ( 1 / 8 ) y 1 e −( y^1 + y^2 )/^2 , y 1 > 0 , y 2 > 0 , 0 , elsewhere. (Measurements are in hundreds of hours.) Find P ( Y 1 > 1 , Y 2 > 1 ).
Recall that the distinct values assumed by a discrete random variable represent mu- tually exclusive events. Similarly, for all distinct pairs of values y 1 , y 2 , the bivariate events ( Y 1 = y 1 , Y 2 = y 2 ), represented by ( y 1 , y 2 ), are mutually exclusive events. It follows that the univariate event ( Y 1 = y 1 ) is the union of bivariate events of the type ( Y 1 = y 1 , Y 2 = y 2 ), with the union being taken over all possible values for y 2.
236 Chapter 5 Multivariate Probability Distributions
For example, reconsider the die-tossing experiment of Section 5.2, where
Y 1 = number of dots on the upper face of die 1, Y 2 = number of dots on the upper face of die 2.
Then
P ( Y 1 = 1 ) = p ( 1 , 1 ) + p ( 1 , 2 ) + p ( 1 , 3 ) + · · · + p ( 1 , 6 ) = 1 / 36 + 1 / 36 + 1 / 36 + · · · + 1 / 36 = 6 / 36 = 1 / 6 P ( Y 1 = 2 ) = p ( 2 , 1 ) + p ( 2 , 2 ) + p ( 2 , 3 ) + · · · + p ( 2 , 6 ) = 1 / 6 . . . P ( Y 1 = 6 ) = p ( 6 , 1 ) + p ( 6 , 2 ) + p ( 6 , 3 ) + · · · + p ( 6 , 6 ) = 1 / 6.
Expressed in summation notation, probabilities about the variable Y 1 alone are
P ( Y 1 = y 1 ) = p 1 ( y 1 ) =
y 2 = 1
p ( y 1 , y 2 ).
Similarly, probabilities corresponding to values of the variable Y 2 alone are given by
p 2 ( y 2 ) = P ( Y 2 = y 2 ) =
y 1 = 1
p ( y 1 , y 2 ).
Summation in the discrete case corresponds to integration in the continuous case, which leads us to the following definition.
DEFINITION 5.4 a Let Y 1 and Y 2 be jointly discrete random variables with probability function p ( y 1 , y 2 ). Then the marginal probability functions of Y 1 and Y 2 , respectively, are given by p 1 ( y 1 ) =
all y 2
p ( y 1 , y 2 ) and p 2 ( y 2 ) =
all y 1
p ( y 1 , y 2 ).
b Let Y 1 and Y 2 be jointly continuous random variables with joint density function f ( y 1 , y 2 ). Then the marginal density functions of Y 1 and Y 2 , respectively, are given by
f 1 ( y 1 ) =
−∞
f ( y 1 , y 2 ) dy 2 and f 2 ( y 2 ) =
−∞
f ( y 1 , y 2 ) dy 1.
The term marginal, as applied to the univariate probability functions of Y 1 and Y 2 , has intuitive meaning. To find p 1 ( y 1 ), we sum p ( y 1 , y 2 ) over all values of y 2 and hence accumulate the probabilities on the y 1 axis (or margin). The discrete and continuous cases are illustrated in the following two examples.
238 Chapter 5 Multivariate Probability Distributions
f ( y 1 , y 2 )
y 1
y 2
1
1
1
2
0
F I G U R E 5. Geometric representation of f ( y 1 , y 2 ), Example 5.
would be a triangular probability density that would look like the side of the wedge in Figure 5.6. If the probability were accumulated along the y 2 axis (accumulating along lines parallel to the y 1 axis), the resulting density would be uniform. We will confirm these visual solutions by applying Definition 5.4. Then, if 0 ≤ y 1 ≤ 1,
f 1 ( y 1 ) =
−∞
f ( y 1 , y 2 ) dy 2 =
0
2 y 1 dy 2 = 2 y 1
y 2
0
and if y 1 < 0 or y 1 > 1,
f 1 ( y 1 ) =
−∞
f ( y 1 , y 2 ) dy 2 =
0
0 dy 2 = 0.
Thus,
f 1 ( y 1 ) =
2 y 1 , 0 ≤ y 1 ≤ 1 , 0 , elsewhere. Similarly, if 0 ≤ y 2 ≤ 1,
f 2 ( y 2 ) =
−∞
f ( y 1 , y 2 ) dy 1 =
0
2 y 1 dy 1 = y^21
0
and if y 2 < 0 or y 2 > 1,
f 2 ( y 2 ) =
−∞
f ( y 1 , y 2 ) dy 1 =
0
0 dy 1 = 0. Summarizing,
f 2 ( y 2 ) =
1 , 0 ≤ y 2 ≤ 1 , 0 , elsewhere. Graphs of f 1 ( y 1 ) and f 2 ( y 2 ) trace triangular and uniform probability densities, respectively, as expected.
We now turn our attention to conditional distributions, looking first at the discrete case. The multiplicative law (Section 2.8) gives the probability of the intersection A ∩ B as P ( A ∩ B ) = P ( A ) P ( B | A ),
5.3 Marginal and Conditional Probability Distributions 239
where P ( A ) is the unconditional probability of A and P ( B | A ) is the probability of B given that A has occurred. Now consider the intersection of the two numerical events, ( Y 1 = y 1 ) and ( Y 2 = y 2 ), represented by the bivariate event ( y 1 , y 2 ). It follows directly from the multiplicative law of probability that the bivariate probability for the intersection ( y 1 , y 2 ) is
p ( y 1 , y 2 ) = p 1 ( y 1 ) p ( y 2 | y 1 ) = p 2 ( y 2 ) p ( y 1 | y 2 ).
The probabilities p 1 ( y 1 ) and p 2 ( y 2 ) are associated with the univariate probability distributions for Y 1 and Y 2 individually (recall Chapter 3). Using the interpretation of conditional probability discussed in Chapter 2, p ( y 1 | y 2 ) is the probability that the random variable Y 1 equals y 1 , given that Y 2 takes on the value y 2.
DEFINITION 5.5 If Y 1 and Y 2 are jointly discrete random variables with joint probability function p ( y 1 , y 2 ) and marginal probability functions p 1 ( y 1 ) and p 2 ( y 2 ), respectively, then the conditional discrete probability function of Y 1 given Y 2 is
p ( y 1 | y 2 ) = P ( Y 1 = y 1 | Y 2 = y 2 ) =
P ( Y 1 = y 1 , Y 2 = y 2 ) P ( Y 2 = y 2 )
p ( y 1 , y 2 ) p 2 ( y 2 )
provided that p 2 ( y 2 ) > 0.
Thus, P ( Y 1 = 2 | Y 2 = 3 ) is the conditional probability that Y 1 = 2 given that Y 2 = 3. A similar interpretation can be attached to the conditional probability p ( y 2 | y 1 ). Note that p ( y 1 | y 2 ) is undefined if p 2 ( y 2 ) = 0.
EXAMPLE 5.7 Refer to Example 5.5 and find the conditional distribution of Y 1 given that Y 2 = 1. That is, given that one of the two people on the committee is a Democrat, find the conditional distribution for the number of Republicans selected for the committee.
Solution The joint probabilities are given in Table 5.2. To find p ( y 1 | Y 2 = 1 ), we concentrate on the row corresponding to Y 2 = 1. Then
p ( 0 , 1 ) p 2 ( 1 )
p ( 1 , 1 ) p 2 ( 1 )
and
P ( Y 1 ≥ 2 | Y 2 = 1 ) =
p ( 2 , 1 ) p 2 ( 1 )
In the randomly selected committee, if one person is a Democrat (equivalently, if Y 2 = 1), there is a high probability that the other will be a Republican (equivalently, Y 1 = 1).
5.3 Marginal and Conditional Probability Distributions 241
DEFINITION 5.7 Let Y 1 and Y 2 be jointly continuous random variables with joint density f ( y 1 , y 2 ) and marginal densities f 1 ( y 1 ) and f 2 ( y 2 ), respectively. For any y 2 such that f 2 ( y 2 ) > 0, the conditional density of Y 1 given Y 2 = y 2 is given by
f ( y 1 | y 2 ) =
f ( y 1 , y 2 ) f 2 ( y 2 ) and, for any y 1 such that f 1 ( y 1 ) > 0, the conditional density of Y 2 given Y 1 = y 1 is given by
f ( y 2 | y 1 ) =
f ( y 1 , y 2 ) f 1 ( y 1 )
Note that the conditional density f ( y 1 | y 2 ) is undefined for all y 2 such that f 2 ( y 2 ) = 0. Similarly, f ( y 2 | y 1 ) is undefined if y 1 is such that f 1 ( y 1 ) = 0.
EXAMPLE 5.8 A soft-drink machine has a random amount Y 2 in supply at the beginning of a given day and dispenses a random amount Y 1 during the day (with measurements in gallons). It is not resupplied during the day, and hence Y 1 ≤ Y 2. It has been observed that Y 1 and Y 2 have a joint density given by
f ( y 1 , y 2 ) =
1 / 2 , 0 ≤ y 1 ≤ y 2 ≤ 2 , 0 elsewhere. That is, the points ( y 1 , y 2 ) are uniformly distributed over the triangle with the given boundaries. Find the conditional density of Y 1 given Y 2 = y 2. Evaluate the probability that less than 1/2 gallon will be sold, given that the machine contains 1.5 gallons at the start of the day.
Solution The marginal density of Y 2 is given by
f 2 ( y 2 ) =
−∞
f ( y 1 , y 2 ) dy 1.
Thus,
f 2 ( y 2 ) =
∫ (^) y 2
0
( 1 / 2 ) dy 1 = ( 1 / 2 ) y 2 , 0 ≤ y 2 ≤ 2 , ∫ (^) ∞
−∞
0 dy 1 = 0 , elsewhere.
Note that f 2 ( y 2 ) > 0 if and only if 0 < y 2 ≤ 2. Thus, for any 0 < y 2 ≤ 2, using Definition 5.7,
f ( y 1 | y 2 ) =
f ( y 1 , y 2 ) f 2 ( y 2 )
( 1 / 2 )( y 2 )
y 2
, 0 ≤ y 1 ≤ y 2.
Also, f ( y 1 | y 2 ) is undefined if y 2 ≤ 0 or y 2 > 2. The probability of interest is
−∞
f ( y 1 | y 2 = 1. 5 ) dy 1 =
0
dy 1 =
242 Chapter 5 Multivariate Probability Distributions
If the machine contains 2 gallons at the start of the day, then
P ( Y 1 ≤ 1 / 2 | Y 2 = 2 ) =
0
dy 1 =
Thus, the conditional probability that Y 1 ≤ 1 /2 given Y 2 = y 2 changes appreciably depending on the particular choice of y 2.
5.19 In Exercise 5.1, we determined that the joint distribution of Y 1 , the number of contracts awarded to firm A, and Y 2 , the number of contracts awarded to firm B, is given by the entries in the following table.
y 1 y 2 0 1 2 0 1 / 9 2 / 9 1 / 9 1 2 / 9 2 / 9 0 2 1 / 9 0 0
a Find the marginal probability distribution of Y 1. b According to results in Chapter 4, Y 1 has a binomial distribution with n = 2 and p = 1 /3. Is there any conflict between this result and the answer you provided in part (a)? 5.20 Refer to Exercise 5.2. a Derive the marginal probability distribution for your winnings on the side bet. b What is the probability that you obtained three heads, given that you won $1 on the side bet? 5.21 In Exercise 5.3, we determined that the joint probability distribution of Y 1 , the number of married executives, and Y 2 , the number of never-married executives, is given by
p ( y 1 , y 2 ) =
( 4 y 1
) ( 3 y 2
) ( 2 3 − y 1 − y 2
)
( 9 3
)
where y 1 and y 2 are integers, 0 ≤ y 1 ≤ 3, 0 ≤ y 2 ≤ 3, and 1 ≤ y 1 + y 2 ≤ 3. a Find the marginal probability distribution of Y 1 , the number of married executives among the three selected for promotion. b Find P ( Y 1 = 1 | Y 2 = 2 ). c If we let Y 3 denote the number of divorced executives among the three selected for promo- tion, then Y 3 = 3 − Y 1 − Y 2. Find P ( Y 3 = 1 | Y 2 = 1 ). d Compare the marginal distribution derived in (a) with the hypergeometric distributions with N = 9, n = 3, and r = 4 encountered in Section 3.7. 5.22 In Exercise 5.4, you were given the following joint probability function for
Y 1 =
{ 0 , if child survived, 1 , if not,
and Y 2 =
0 , if no belt used, 1 , if adult belt used, 2 , if car-seat belt used.