


















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
1 / 26
This page cannot be seen from the preview
Don't miss anything!



















6.1 Introduction 113
6.1 Introduction
In this chapter we consider inferential methods for comparing two population propor- tions p 1 and p 2. More specifically, we consider methods for making inferences about the difference p 1 − p 2 between two population proportions p 1 and p 2. The inferential methods for a single proportion p discussed in Chapter 5 are based on a large sample size normal approximation to the sampling distribution of ˆp. The inferential methods we will discuss in this chapter are based on an analogous large sample size normal approximation to the sampling distribution of ˆp 1 − pˆ 2. Sections 6.2 and 6.3 deal with inferential methods appro- priate when the data consist of independent random samples. The modifications needed for dependent (paired) samples are discussed in Section 6.4.
6.2 Estimation for two proportions (independent samples)
In some applications there are two actual physical dichotomous populations so that p 1 denotes the population success proportion for population one and p 2 denotes the pop- ulation success proportion for population two. In other applications, such as randomized comparative experiments p 1 and p 2 denote hypothetical population success probabilities corresponding to two treatments. We will assume that the data correspond to two inde- pendent sequences of Bernoulli trials: a sequence of n 1 Bernoulli trials with population success probability p 1 and an independent sequence of n 2 Bernoulli trials with population success probability p 2. The assumption that these are independent sequences of Bernoulli trials means that the outcomes of all n 1 + n 2 trials are independent. When sampling from physical populations these assumptions are equivalent to assuming that the data consist of two independent simple random samples (of sizes n 1 and n 2 ) selected with replacement from dichotomous populations with population success proportions p 1 and p 2. In this con- text the assumption of independence basically means that the method used to select the random sample from the first population is not influenced by the method used to select the random sample from the second population, and vice versa. The observed success proportions ˆp 1 and ˆp 2 are the obvious estimates of the two pop- ulation success proportions p 1 and p 2 ; and the difference ˆp 1 − pˆ 2 between these observed success proportions is the obvious estimate of difference p 1 −p 2 between the two population success proportions. The behavior of ˆp 1 − pˆ 2 as an estimator of p 1 − p 2 can be determined from its sampling distribution. As you might expect, since ˆp 1 and ˆp 2 are unbiased es- timators of p 1 and p 2 , ˆp 1 − pˆ 2 is an unbiased estimator of p 1 − p 2. Thus the sampling
114 6.2 Estimation for two proportions (independent samples)
distribution of ˆp 1 − pˆ 2 has mean equal to p 1 − p 2. The standard deviation of the sampling distribution of ˆp 1 − pˆ 2 is the population standard error of ˆp 1 − pˆ 2
S.E.(ˆp 1 − pˆ 2 ) =
p 1 (1 − p 1 ) n 1
Notice that the population variance var(ˆp 1 − pˆ 2 ) (the square of S.E.(ˆp 1 − pˆ 2 )) is equal to the sum of the population variance of ˆp 1 and the population variance of ˆp 2. This property is a consequence of our assumption that the random samples are independent. This expression for the standard error of the difference between two sample success proportions is not appropriate if the random samples are not independent. As was the case for the sampling distribution of a single sample proportion, the sam- pling distribution of ˆp 1 − pˆ 2 is not the same when ˆp 1 and ˆp 2 are based on samples selected without replacement as it is when ˆp 1 and ˆp 2 are based on samples selected with replace- ment. In both sampling situations, the mean of the sampling distribution of ˆp 1 − pˆ 2 is p 1 − p 2. Thus ˆp 1 − pˆ 2 is an unbiased estimator of p 1 − p 2 , whether the samples are selected with or without replacement. On the other hand, as with a single proportion, the standard error of ˆp 1 − pˆ 2 is smaller when the samples are selected without replacement. This implies that, strictly speaking, the confidence interval estimates of p 1 − p 2 given below, which are based on the assumption that the samples are selected with replacement, are not appro- priate when the samples are selected without replacement. However, if the sizes of the two populations are both very large relative to the sizes of the samples, then, for practical pur- poses, we can ignore the fact that the samples were selected without replacement. Hence, when we have samples selected without replacement and we know that the populations are very large, it is not unreasonable to compute a confidence interval estimate of p 1 − p 2 as if the samples were selected with replacement.
Remark. When pˆ 1 and pˆ 2 are computed from independent simple random samples of sizes n 1 and n 2 selected without replacement from dichotomous populations of sizes N 1 and N 2 , the population standard error of pˆ 1 − pˆ 2
S.E.(ˆp 1 − pˆ 2 ) =
f 1 p^1 (1 n^ −^ p^1 ) 1
is smaller than the population standard error for independent samples selected with re- placement. In this situation there are two finite population correction factors f 1 = (N 1 − n 1 )/(N 1 − 1) and f 2 = (N 2 − n 2 )/(N 2 − 1) and the effect on the standard er- ror is most noticeable when one or both of the N ′s is small relative to the corresponding n. If N 1 and N 2 are both very large relative to the respective n 1 and n 2 , then f 1 ≈ 1 , f 2 ≈ 1 , and the two standard errors are essentially equal.
116 6.2 Estimation for two proportions (independent samples)
This probability statement indicates that the probability that the actual difference p 1 − p 2 is within 1.96S.E.(ˆp 1 − pˆ 2 ) units of the observed difference ˆp 1 − pˆ 2 is approximately .95. As was the case with the analogous interval for one proportion, this interval is not computable, since it involves the population standard error S.E.(ˆp 1 − pˆ 2 ) which depends on the unknown parameters p 1 and p 2 and is therefore also unknown. The method we used to derive the Wilson confidence interval for a single proportion will not work in the present context. Therefore, in the present context we will consider a confidence interval estimate of the difference p 1 − p 2 based on the estimated difference pˆ 1 − pˆ 2 and the estimated standard error of ˆp 1 − pˆ 2
S.E.(ˆ̂ p 1 − pˆ 2 ) =
p ˆ 1 (1 − pˆ 1 ) n 1 +
pˆ 2 (1 − pˆ 2 ) n 2.
We will refer to this estimated standard error as the standard error for estimation. The margin of error of ˆp 1 − pˆ 2 is obtained by multiplying this estimated standard error by a suitable constant k. (Recall that: for a 95% confidence level k = 1.96, for a 90% confidence level k = 1.645, and for a 99% confidence level k = 2.576.) The 95% margin of error of pˆ 1 − pˆ 2 is M.E.(ˆp 1 − pˆ 2 ) = 1. 96 S.E.(ˆ̂ p 1 − pˆ 2 )
and the interval from (ˆp 1 − pˆ 2 ) − M.E.(ˆp 1 − pˆ 2 ) to (ˆp 1 − pˆ 2 ) + M.E.(ˆp 1 − pˆ 2 ) is a 95% confidence interval estimate of the difference p 1 − p 2. Thus we can claim that we are 95% confident that the difference p 1 −p 2 between the population success proportions is between
(ˆp 1 − pˆ 2 ) − M.E.(ˆp 1 − pˆ 2 ) and (ˆp 1 − pˆ 2 ) + M.E.(ˆp 1 − pˆ 2 ).
Recall that it is the estimate ˆp 1 − pˆ 2 and the margin of error M.E.(ˆp 1 − pˆ 2 ) which vary from sample to sample. Therefore, the 95% confidence level applies to the method used to generate the confidence interval estimate.
Example. Rural versus urban voter preferences. Suppose that a polling or- ganization has separate listings of all the registered voters in a large rural district and a large urban district and wishes to compare the proportions of voters in these districts who favor a proposition which is to appear on an upcoming election ballot. Let p 1 denote the proportion of all registered voters in the rural district who favor the proposition at the time of the poll and let p 2 denote the proportion of all registered voters in the urban district who favor the proposition at the time of the poll. (In terms of the box of balls analogy of Chapter 5, we now have two boxes of balls with p 1 denoting the proportion of red balls in box one and p 2 denoting the proportion of red balls in box two.)
6.2 Estimation for two proportions (independent samples) 117
The most obvious way to obtain independent random samples in this scenario is to: (1) randomly generate a set of n 1 labels for the rural district, contact the corresponding voters, and compute the estimate ˆp 1 for the voters in the rural district; and, (2) randomly generate a set of n 2 labels for the urban district, contact the corresponding voters, and compute the estimate ˆp 2 for the voters in the urban district. (Select a simple random sample of balls from box one and compute ˆp 1 and, independently, select a simple random sample of balls from box two and compute ˆp 2 .) Assuming that simple random samples are selected (with replacement or from large populations) this method clearly yields independent samples and the confidence interval method described above is valid. Now suppose that we do not have separate listings of the rural voters and the urban voter but instead have a single listing of all registered voters in a large district which includes both rural and urban voters. In this situation we could randomly generate a set of n labels for the entire district, contact the corresponding voters, and in addition to determining whether the voter favors the proposition also determine whether the voter lives in a rural or urban area. We could then partition the simple random sample of n voters into the subsample of n 1 voters who live in a rural area and the subsample of n 2 voters who live in an urban area. (This is like labeling the balls in box one with a one, labeling the balls in box two with a two, then combining the balls in a single box, selecting a simple random sample of n balls from this box, and dividing it to get a sample of n 1 balls from box one and a sample of n 2 balls from box two.) This approach yields independent random samples but, technically (based on the formal definition), these random samples are not simple random samples, since the sample sizes n 1 and n 2 were not selected in advance. Actually this is not a problem, since it is readily verified that the samples can be viewed as independent sequences of Bernoulli trials (exactly if selection is with replacement and approximately if selection is without replacement from a large population and both subpopulations are also large). Therefore, the confidence interval method described above is also valid when this alternate method of forming independent random samples by partitioning a simple random sample is used.
Example. An opinion poll. The purpose of this example is to demonstrate the application of a 95% confidence interval for p 1 − p 2. To make the numbers more realistic we will use numbers from a New York Times/CBS News poll conducted September 9–13,
6.2 Estimation for two proportions (independent samples) 119
p 1 − p 2 is between -.2263 and -.0923 (or equivalently that p 2 − p 1 is between .0923 and .2263). Since this entire interval (for p 1 − p 2 ) is negative we can conclude that we are 95% confident that the population proportion of whites who would have responded yes if all had been asked is less than the analogous population proportion for blacks by at least. and perhaps as much as .2263. In other words, we are 95% confident that the percentage of all blacks who would have responded yes exceeds the corresponding percentage for whites by between 9.23 and 22.63 percentage points.
Another common application of this confidence interval for the difference between two population proportions is for randomized comparative experiments. Consider a random- ized comparative experiment where N = n 1 + n 2 available units are randomly assigned to receive one of two treatments (with n 1 units assigned to treatment 1 and the remaining n 2 units assigned to treatment 2). We can imagine two hypothetical populations of responses and two population success proportions corresponding to the two treatments. The first hypothetical population is the collection of responses (S or F), corresponding to all N available units, which we would observe if all N available units were subjected to treat- ment 1 and p 1 is the proportion of successes among these units. The second hypothetical population and population success proportion p 2 are defined similarly to correspond to the responses we would observe if all N available units were subjected to treatment 2. The model corresponding to the assumptions we made to justify the confidence interval for p 1 −p 2 treats the data as if they constitute independent simple random samples selected with replacement from these two hypothetical populations. In terms of balls in a box, this means that we are assuming that we have independent simple random samples selected with replacement from two separate boxes of balls, with each box containing N balls. Clearly this model is not appropriate for this application; a more appropriate model treats the data as two dependent random samples selected without replacement from a single box of N balls. Fortunately, even though the underlying assumptions are not valid for this application the method still works reasonably well. Before we describe why it is helpful to consider a specific example.
Example. Leading questions. The wording of questions in surveys can have a ma- jor impact on the responses elicited. The effect of wording of questions was investigated in Schuman and Presser, Attitude measurement and the gun control paradox, Public Opinion Quarterly, 41 winter 1977–1978, 427–438. Two groups of adults were used to estimate the difference in response to the following two versions of a question regarding gun control.
120 6.2 Estimation for two proportions (independent samples)
122 6.3 Testing for two proportions (independent samples)
Fortunately, provided that n 1 and n 2 are reasonably large, the effects of these two violations of the underlying assumptions tend to cancel each other and the confidence interval based on the assumptions of independent simple random samples selected with replacement work reasonably well for randomized comparative experiments.
Remark. The use of one of the confidence limits of a 90% confidence interval as a 95% confidence bound discussed in Section 5.4 can also be used in the present context. Thus, we can find an upper or lower 95% confidence bound for p 1 − p 2 by selecting the appropriate confidence limit from a 90% confidence interval estimate of p 1 − p 2.
6.3 Testing hypotheses about two proportions (independent samples)
In this section we will consider hypothesis tests for hypotheses relating two population success proportions p 1 and p 2. The tests we consider are based on the same normal ap- proximation to the sampling distribution of ˆp 1 − pˆ 2 that we used for confidence estimation. Thus we will assume that the data on which the hypothesis test is based correspond to two independent simple random samples of sizes n 1 and n 2 , selected with replacement, from dichotomous populations with population success proportions p 1 and p 2 , or equiv- alently, that the data correspond to the outcomes of two independent sequences of n 1 and n 2 Bernoulli trials with success probabilities p 1 and p 2. However, as with confidence estimation, for practical purposes, we do not need to worry about whether the samples are selected with or without replacement, provided both of the populations are very large; and, these tests are also applicable to randomized comparative experiments. Many hypotheses about the relationship between the population proportions p 1 and p 2 can be expressed as hypotheses about the relationship between p 1 − p 2 and zero, e.g., p 1 > p 2 is equivalent to p 1 − p 2 > 0. Therefore, we will consider tests which are based on a suitably standardized value of the difference ˆp 1 − pˆ 2 between the observed success proportions. The P –value for a hypothesis about the relationship between a single proportion p and a hypothesized value p 0 is computed under the assumption that p = p 0 , therefore, we used p = p 0 in the standard error of ˆp for the Z–statistic of the test. The P –value for a hypothesis about the relationship between p 1 and p 2 is computed under the assumption that p 1 = p 2 , therefore, we need to determine a suitable standard error of ˆp 1 − pˆ 2 (the standard error for testing) under this assumption. Notice that p 1 = p 2 (p 1 − p 2 = 0) specifies a common value for p 1 and p 2 but does not specify what this common value is, e.g., we might have p 1 = p 2 = .5 or p 1 = p 2 = .1. When p 1 = p 2 , ˆp 1 and ˆp 2 are estimates of the same population success proportion. This suggests that we can pool or combine the information in the two random samples to obtain a pooled estimate, ˆp, of this common population success proportion. This pooled estimate ˆp can then be used to get an
6.3 Testing for two proportions (independent samples) 123
estimate of S.E.(ˆp 1 − pˆ 2 ) that is suitable for use in the hypothesis test. If we let p denote the common population success proportion under the assumption that p 1 = p 2 , then the population standard error of ˆp 1 − pˆ 2 simplifies to
S.E.(ˆp 1 − pˆ 2 ) =
p(1 − p)
n 1 +^
n 2
Replacing p in this population standard error by the pooled estimate ˆp gives the standard error for testing
S.E.(ˆ̂ p 1 − pˆ 2 ) =
p ˆ(1 − pˆ)
n 1
n 2
where
pˆ = the total number of successes in both samples the total number of observations in both samples
= n^1 pˆ^1 +^ n^2 pˆ^2 n 1 + n 2
When testing H 0 : p 1 ≤ p 2 versus H 1 : p 1 > p 2 values of ˆp 1 − pˆ 2 which are sufficiently larger than zero provide evidence against the null hypothesis H 0 : p 1 ≤ p 2 and in favor of the research hypothesis H 1 : p 1 > p 2. Thus large (positive) values of
Zcalc = (^) ̂ pˆ^1 −^ pˆ^2 S.E.(ˆp 1 − pˆ 2 )
where S.E.(ˆ̂ p 1 − pˆ 2 ) denotes the standard error for testing, favor the research hypothesis and the P –value is the probability that a standard normal variable takes on a value at least as large as Zcalc, i.e., the P –value is the area under the standard normal density curve to the right of Zcalc. The steps for performing a hypothesis test for H 0 : p 1 ≤ p 2 versus H 1 : p 1 > p 2
are summarized below.
Figure 1. P–value for H 0 : p 1 ≤ p 2 versus H 1 : p 1 > p 2.
0 Zcalc
6.3 Testing for two proportions (independent samples) 125
is, if the P –value is not small enough, then there is not sufficient evidence to conclude that the population proportion p 1 is less than the population success proportion p 2.
When testing H 0 : p 1 = p 2 versus H 1 : p 1 6 = p 2 values of ˆp 1 − pˆ 2 which are sufficiently far away from zero in either direction provide evidence against the null hypothesis H 0 : p 1 = p 2 and in favor of the research hypothesis H 1 : p 1 6 = p 2. Thus sufficiently large values of the absolute value of Zcalc (as defined above) favor the research hypothesis and the P –value is the probability that a standard normal variable takes on a value below −|Zcalc| or above |Zcalc|, i.e., the P –value is the combined area under the standard normal density curve to the left of −|Zcalc| and to the right of |Zcalc|. The steps for performing a hypothesis test for
H 0 : p 1 = p 2 versus H 1 : p 1 6 = p 2
are summarized below.
Figure 3. P–value for H 0 : p 1 = p 2 versus H 1 : p 1 6 = p 2.
-Zcalc 0 Zcalc
2a. If the P –value is small enough (less than .05 for a test at the 5% level of significance), conclude that the data favor H 1 : p 1 6 = p 2 over H 0 : p 1 = p 2. That is, if the P –value is small enough, then there is sufficient evidence to conclude that the population success proportions p 1 and p 2 are different. 2b. If the P –value is not small enough (is not less than .05 for a test at the 5% level of significance), conclude that the data do not favor H 1 : p 1 6 = p 2 over H 0 : p 1 = p 2. That is, if the P –value is not small enough, then there is not sufficient evidence to conclude that the population success proportions p 1 and p 2 are different.
Example. An HIV vaccine trial. This example is based on a study described in Flynn et al., Placebo–controlled phase 3 trial of a recombinant glycoprotein 120 vaccine to
126 6.3 Testing for two proportions (independent samples)
prevent HIV–1 infection, J. of Infect. Dis., 191 Mar. 1, 2005, 654–665. A double–blind randomized trial was conducted to investigate the effect of an rgp120 vaccine among men who have sex with men and among women at high risk for heterosexual transmission of type 1 HIV. A group of 5403 volunteers (5095 men and 308 women) was randomly divided into two groups (a control group (n 1 = 1805) and a vaccine group (n 2 = 3598)). Each volunteer received 7 injections of either placebo or vaccine over a 30 month period. These individuals were tracked for a period of 3 years to see whether they developed HIV–1. We can envision two hypothetical populations based on this group of 5403 individuals and these two experimental treatments. Since these 5403 volunteers do not form a random sample from some well defined population of people at high risk for developing HIV–1 we should restrict our inferences to these 5403 volunteers. Let p 1 denote the proportion of this group of 5403 volunteers who would develop HIV–1 within 3 years if all 5403 volunteers were given the placebo. Let p 2 denote the proportion of this group of 5403 volunteers who would develop HIV–1 within 3 years if all 5403 volunteers were given the vaccine. We can also think of these proportions as the probabilities that one of these 5403 volunteers would develop HIV–1 within 3 years if he or she was treated with the placebo (p 1 ) or if he or she was treated with the vaccine (p 2 ). In terms of these parameters our research hypothesis is H 1 : p 1 > p 2 (the vaccine reduces the risk of developing HIV–1) and our null hypothesis is H 0 : p 1 ≤ p 2 (the vaccine does not reduce the risk of developing HIV–1). By the end of the 3 years, 126 of the 1805 individuals treated with the placebo developed HIV–1 while 241 of the 3598 individuals treated with the vaccine developed HIV–1. The observed proportions are ˆp 1 = .0698 and ˆp 2 = .0670, and the difference is pˆ 1 − pˆ 2 = .0028. The fact that this difference is positive (ˆp 1 is greater than ˆp 2 ) shows that there is some evidence in favor of the research hypothesis p 1 > p 2. We need to determine whether observing a difference of .0028, with samples of size n 1 = 1805 and n 2 = 3598, is sufficiently surprising under the assumption that p 1 ≤ p 2 to allow us to reject this null hypothesis as untenable. When we use the standard error for testing to standardize this difference we get Zcalc = .3892. The corresponding P –value= P (Z ≥ .38929) = .3486 is quite large. In words, this means that (for these sample sizes) if the null hypothesis was true (p 1 was actually no greater than p 2 ), then we would observe a difference this far above zero about 34.86% of the time. In other words, for the volunteers used in this study, these data do not provide enough evidence to allow us to claim that this vaccine is better than a placebo.
Example. Scotland coronary prevention study. This example is based on the West of Scotland Coronary Prevention Study as described in Shepherd et al., Prevention of coronary heart disease with pravastatin in men with hypercholesterolemia, New England Journal of Medicine, 333 Nov. 16, 1995, 1994–1307, and Ford et al., Long–term follow–up of the West of Scotland coronary prevention study, New England Journal of Medicine,
128 6.3 Testing for two proportions (independent samples)
the null hypothesis was true (p 1 was actually no less than p 2 ), then we would almost never (less than .01% of the time) observe a difference this far below zero. Therefore, these data provide very strong evidence in favor of the research hypothesis that pravastatin reduces the probability of a cardiac event in the sense that the probability that one of these 6595 men would have a cardiac event within five years would be lower if he was treated with pravastatin than if he was treated with placebo. In addition to this conclusion that pravastatin reduces the probability of a cardiac event we can construct a confidence interval to quantify the practical importance of this reduction. In this example we are 95% confident that p 1 − p 2 is between -.0344 and -. (p 2 − p 1 is between .0108 and .0344). In summary, for these 6595 men, we have very strong evidence (P –value < .0001) that pravastatin reduces the risk of a cardiac event (versus placebo). We estimate that about 7.53% of these men would have a cardiac event if they all were treated with a placebo, and we are 95% confident that if they all were treated with pravastatin we would see a 1.08 to 3.44 percentage point reduction in this percentage. Since we are dealing with small percentages it is instructive to note that a reduction from 7.53% (ˆp 2 ) to 5.27% (ˆp 1 ) is a 30% reduction ((7. 53 − 5 .27)/ 7 .53 = .3001) in the risk of a man having a cardiac event. A follow–up to this study tracked the men used in this trial for ten additional years to assess the long term effects of treatment with pravastatin. At the end of the five year trial, treatment with pravastatin or placebo ceased, and the patients returned to the care of their primary care physicians. Five years after the conclusion of the trial 38.7% of the original pravastatin group and 35.2% of the original placebo group were being treated with statin drugs. The purpose of the follow–up study was to assess long–term effects regardless of treatment received after the initial trial period. For this part of the study, let p 3 denote the proportion of this group of 6595 men who would experience a cardiac event within 15 years of the beginning of the initial trial if all 6595 men were subjected to the five year pravastatin treatment. Let p 4 denote the analogous proportion if all the men were subjected to the placebo treatment. In terms of these parameters our research hypothesis is H 1 : p 3 < p 4 (pravastatin reduces the long– term risk of a coronary event) and our null hypothesis is H 0 : p 3 ≥ p 4 (pravastatin does not reduce the long–term risk of a coronary event). By the end of the 15 year period, 390 of the 3302 men treated with pravastatin had experienced a cardiac event and 509 of the 3293 men treated with a placebo had experienced a cardiac event. The observed proportions are ˆp 3 = .1181 and ˆp 4 = .1546, and the difference is ˆp 3 − pˆ 4 = −.0365. The fact that this difference is negative (ˆp 3 is less than pˆ 4 ) shows that there is some evidence in favor of the research hypothesis p 3 < p 4. Since the sample sizes for this test are the same as for the test above and since the difference in this case is more extreme than before, we know that the P –value will be even smaller.
6.4 Inference for two proportions (paired samples) 129
In this case, when we use the standard error for testing to standardize this difference we get Zcalc = − 4 .3136. The corresponding P –value= P (Z ≤ − 4 .3146) is less than. (approximately 8. 0 × 10 −^6 ). Therefore, these data provide very strong evidence in favor of the research hypothesis that the five year pravastatin treatment reduces the probability of a cardiac event in the long–term in the sense that the probability that one of these 6595 men would have a cardiac event within 15 years would be lower if he was treated with pravastatin than if he was treated with placebo. In this case we are 95% confident that p 4 exceeds p 3 by at least .0199 and perhaps as much as .0530. Here we would estimate that about 15.46% of these men would have a cardiac event within 15 years if they were all given the five year placebo treatment and we are 95% confident that the five year pravastatin treatment would reduce this percentage by between 1.99 and 5.30 percentage points.
6.4 Inference for two proportions (paired samples)
The inferential methods for comparing two population success proportions p 1 and p 2 we have considered thus far require independent estimates ˆp 1 and ˆp 2. We will now show how these methods can be modified when ˆp 1 and ˆp 2 are dependent. In some situations each unit in the first sample is paired with a corresponding unit in the second sample. The units which form a pair may be the same unit measured at two times or measured under two treatments; or the units which form a pair may be distinct units which are matched on the basis of characteristics believed to be related to the response of interest. Consider the problem of assessing the effect of a debate between two candidates (A and B) in an upcoming election on voter opinion. Let p 1 denote the population proportion of voters who favor candidate A on the day before the debate and let p 2 denote the population proportion of all voters who favor candidate A on the day after the debate. Instead of selecting two independent simple random samples of voters, we could select a single simple random sample of voters and get responses (whether the voter favors candidate A) for each of these voter one day before the debate and one day after the debate. Suppose that we wish to compare two methods of training workers to perform a com- plex task. Let p 1 denote the probability that a worker could perform this task satisfactorily if the worker was trained using the first method and let p 2 denote the probability that a worker could perform this task satisfactorily if the worker was trained using the second method. Instead of randomly assigning workers to two groups, we could use preliminary information about the ability of the workers to perform this task to form matched pairs of workers (each having essentially the same ability). For each pair we could randomly assign one member to be trained using the first method and the other to be trained using the second method. Then we could determine whether each worker could successfully perform the task.
6.4 Inference for two proportions (paired samples) 131
When computing the P –value for a hypothesis test we will assume that p 1 = p 2 which is equivalent to assuming that pSF = pF S. Under this assumption the population standard error of ˆp 1 − pˆ 2 simplifies to
S.E.(ˆp 1 − pˆ 2 ) =
pSF + pF S n.
Thus for hypothesis testing the estimated standard error of ˆp 1 − pˆ 2 is
S.E.(ˆ̂ p 1 − pˆ 2 ) =
pˆSF + ˆpF S n
The Z–statistic for this situation is
Zcalc = √ pˆ^1 −^ pˆ^2 (ˆpSF + ˆpF S )/n
= √nSF^ −^ nF S nSF + nF S
where nSF and nF S are the respective frequencies of (S,F) and (F,S) pairs. Notice that this test statistic only depends on the frequencies nSF and nF S , it does not depend on the sample size n. Example. Instant coffee purchases. This example is based on a study described in Grover and Srinivasan, A simultaneous approach to market segmentation and market structuring, J. of Marketing Research, 24 May 1987, 139–153. The authors selected a sim- ple random sample of households from the 4657 households constituting the 1981 MRCA market research panel. The data summarized in Table 2 correspond to a simple random sample of n = 541 households selected from the subpopulation of the MRCA households that purchased decaffeinated instant coffee at least twice during the one year study period. These purchases are recorded as Sanka or other to indicate the brand of coffee purchased. Let p 1 denote the population proportion of households that chose Sanka on the first pur- chase and let p 2 denote the population proportion of households that chose Sanka on the second purchase.
Table 2. Instant coffee purchase data first purchase second purchase freq. rel. freq. Sanka Sanka 155. Sanka other 49. other Sanka 76. other other 261. 541 1.
In this sample 37.71% of the first purchases were Sanka and 42.70% of the second pur- chases were Sanka. Note that ˆpSF = .0906 and ˆpF S = .1405 indicating that 9.06% of
132 6.4 Inference for two proportions (paired samples)
the households switched from Sanka to other and 14.05% of the households switched from other to Sanka. In this case ˆp 1 − pˆ 2 =. 3771 − .4270 = −.0499, the standard error for estimation is
S.E.(ˆ̂ p 1 − pˆ 2 ) =
and the 95% margin of error is M.E.(ˆp 1 − pˆ 2 ) = .0403. This gives a 95% confidence interval for p 1 −p 2 ranging from −. 0499 −.0403 = −.0902 to −.0499+.0403 = −.0096. Thus we are 95% confident that the proportion of all households in the subpopulation defined above that chose Sanka first is between .0096 and .0902 smaller than the proportion of all households that chose Sanka second. In other words, for this population of decaffeinated instant coffee purchasers, we are 95% confident that the percentage of all households that chose Sanka on the second purchase is .96 to 9.02 percentage points higher than the percentage of all households that chose Sanka on the first purchase. To demonstrate the method, consider a test of the null hypothesis H 0 : p 1 = p 2 (the same proportion purchase Sanka first as second) versus the research hypothesis H 1 : p 1 6 = p 2 (the proportions are different). For this test the Z–statistic is
Zcalc = √nnSF^ −^ nF S SF +^ nF S
and the P –value is P (Z ≤ − 2 .415) + P (Z ≥ 2 .415) = .0157. Therefore, there is sufficient evidence to conclude that p 1 and p 2 are different.
Another situation where an inference about p 1 − p 2 is based on dependent estimates pˆ 1 and ˆp 2 arises when a single sample of units is categorized into three or more categories. Suppose that three or more candidates are listed on a ballot and we want to compare the proportion of all voters who favor candidate A, pA, with the proportion of all voters who favor candidate B, pB. Let pC = 1 − (pA + pB ) denote the proportion of all voters who favor neither A nor B or who have no opinion. The probability model for this situation given in Table 3 is determined by the corresponding population probabilities pA, pB , and pC. Notice that these three probabilities must sum to one.
Table 3. Probability model for trichotomous responses response probability A pA B pB C pC