Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The construction of confidence intervals and hypothesis tests for the difference in means of two normal distributions with known and unknown variances. It covers pooled t methods, welch t methods, and nonparametric methods for comparing samples from nonnormal distributions. The document also introduces the shift model and the hodges-lehmann estimator for estimating the shift parameter.
Typology: Study notes
1 / 37
In many statistical applications, interest focuses on comparing two probability distributions. For example, an education researcher might be interested in determining if the distributions of standardized test scores for students in public and private schools are equal, or a medical researcher might be interested in determining if mean blood pressure levels are the same in patients on two different treatment protocols.
This chapter considers statistical methods for comparing independent random samples from two continuous distributions.
Let X 1 , X 2 ,... , Xn and Y 1 , Y 2 ,... , Ym be independent random samples, of sizes n and m, from normal distributions with parameters
μx = E(X), σx = SD(X), μy = E(Y ), σy = SD(Y ).
This section focuses on answering statistical questions about the difference in means, μx−μy.
The difference in sample means, X − Y , can be used to estimate the difference in means. The statistic X − Y is a normal random variable with
( X − Y
) = μx − μy and V ar
( X − Y
σ^2 x n
σ y^2 m
10.1.1 Known variances
Assume that σx and σy are known.
Confidence intervals for μx − μy. If σx and σy are known, then
( X − Y
) ± z(α/2)
√ σ^2 x n
σ y^2 m
is a 100(1 − α)% confidence interval for μx − μy , where z(α/2) is the 100(1 − α/2)% point of the standard normal distribution.
The demonstration of this procedure uses the fact that the standardized difference in sample means (Z) is a standard normal random variable.
Hypothesis tests of μx − μy = δo. If σx and σy are known, then the standardized difference when μx − μy = δo:
( X − Y
) − δo √ σ^2 x n +^
σ^2 y m
can be used as test statistic. The following table gives the rejection regions for one sided and two sided 100α% tests:
Alternative Hypothesis Rejection Region
μx − μy < δo Z ≤ −z(α) μx − μy > δo Z ≥ z(α) μx − μy 6 = δo |Z| ≥ z(α/2)
where z(p) is the 100(1 − p)% point of the standard normal distribution.
Exercise 1 Assume the following data are the values of independent random samples from normal distributions with common standard deviation 2.
(1) Construct a 95% confidence interval for the difference in means, μx − μy.
(2) Consider a test of the null hypothesis that μx − μy = 4 versus the alternative hypothesis that μx − μy 6 = 4.
Would the null hypothesis be accepted or rejected at the 5% significance level? State the conclusion and report the observed significance level (p value).
10.1.2 Pooled t methods
Pooled t methods are used when X and Y have a common unknown variance:
σ^2 = σ^2 x = σ y^2.
Pooling information. Let S x^2 and S y^2 be the sample variances of the X and Y samples, respectively, and let
S p^2 =
(n − 1)S^2 x + (m − 1)S y^2 n + m − 2
Then the statistic S p^2 is the pooled estimate of σ^2.
The statistic S^2 p is a weighted average of the separate estimates of σ^2. If n = m, then the weights are equal; otherwise, the estimate based on the larger sample is given the larger weight.
Note that
(1) S p^2 is an unbiased estimator of σ^2 , and
(2) The statistic
V =
(n + m − 2)S p^2 σ^2
has a chi-square distribution with (n + m − 2) degrees of freedom.
The first property follows from properties of expectation:
E(S^2 p ) = (^) n+m^1 − 2
( (n − 1)E(S x^2 ) + (m − 1)E(S y^2 )
)
= (^) n+m^1 − 2
( (n − 1)σ^2 + (m − 1)σ^2
) = σ^2.
To prove the second property, note first that
V =
(n + m − 2)S p^2 σ^2
=
(n − 1)S^2 x σ^2
(m − 1)S y^2 σ^2 is a sum of independent chi-square random variables with (n − 1) and (m − 1) df.
The result follows from the fact that the sum of independent chi-square random variables has a chi-square distribution with df equal to the sum of the separate degrees of freedom.
Theorem 2 (Approximate Standardization) Under the assumptions of this section, the statistic
T =
(X − Y ) − (μx − μy) √ S p^2 ( (^) n^1 + (^) m^1 )
has a Student t distribution with (n + m − 2) degrees of freedom.
This theorem allows us to develop confidence procedures and tests for the difference in population means when the X and Y distributions have a common unknown variance.
Tests based on the pooled t statistic are generalized likelihood ratio tests.
Confidence intervals for μx − μy. If the value of σ^2 = σ^2 x = σ^2 y is estimated from the data, then the theorem above can be used to demonstrate that
( X − Y
) ± tn+m− 2 (α/2)
√ S p^2
( 1 n
m
)
is a 100(1 − α)% confidence interval for μx − μy , where tn+m− 2 (α/2) is the 100(1 − α/2)% point on the Student t distribution with (n + m − 2) df.
Hypothesis tests of μx − μy = δo. If the value of σ^2 = σ^2 x = σ^2 y is estimated from the data, then the approximate standardization when μx − μy = δo
(X − Y ) − δ 0 √ S p^2
( 1 n +^
1 m
)
can be used as test statistic. The following table gives the rejection regions for one sided and two sided 100α% tests:
Alternative Hypothesis Rejection Region
μx − μy < δo T ≤ −tn+m− 2 (α) μx − μy > δo T ≥ tn+m− 2 (α) μx − μy 6 = δo |T | ≥ tn+m− 2 (α/2)
where tn+m− 2 (p) is the 100(1 − p)% point of the Student t distribution with (n + m − 2) df.
Exercise 3 (Shoemaker, JSE, 1996, 4(2), 14 paragraphs ) Normal body temperatures of 148 subjects were taken several times over two consecutive days. A total of 130 values are reported below.
(1) X sample: 65 temperatures (in degrees Fahrenheit) for women
(2) Y sample: 65 temperatures (in degrees Fahrenheit) for men
Women Men
97
98
99
100
101
temp
-2 -1 0 1 2 exp
0
2
4
obs
The normal probability plot is enhanced by including the results of 100 simulations from the standard normal distribution.
The plot suggests that normal theory methods are reasonable (but says nothing about whether or not the population variances are equal).
Assume these data are the values of independent random samples from normal distributions with a common variance.
Exercise 4 (Larsen & Marx, Prentice-Hall, 1986, p. 373). Electroencephalograms are records showing fluctuations of electrical activity in the brain. Among the several different kinds of brain waves produced, the dominant ones are usually alpha waves. These have a characteristic frequency of anywhere from 8 to 13 cycles per second.
As part of a study to determine if sensory deprivation over an extended period of time has any effect on alpha-wave pattern, 20 male inmates in a Canadian prison were randomly split into two equal-sized groups. Members of one group (control group) were allowed to remain in their cells, while members of the other group (treated group) were placed in solitary confinement. After seven days, alpha-wave frequencies were measured in all 20 men:
(1) Average number of cycles per second for members of the control group:
(2) Average number of cycles per second for members of the treated group:
Control Treated
9
10
11
avg.
-2 -1 0 1 2 exp
0
2
4
obs
Assume these data are the values of independent random samples from normal distributions with a common variance.
10.1.3 Welch t methods
Welch t methods are used when X and Y have distinct unknown variances. The following theorem, proven by B. Welch in the 1930’s, says that the approximate standardization of the difference in sample means has an approximate Student t distribution.
Note that the statistic (S x^2 /n) + (S^2 y /m) is an unbiased estimator of the difference in sample
means, and is used in the denominator of the approximate standardization of X − Y.
Theorem 5 (Welch’s Theorem) Under the assumptions of this section, the statistic
(X − Y ) − (μx − μy) √ S x^2 n +^
S^2 y m
has an approximate Student t distribution with degrees of freedom as follows:
df =
( (S^2 x/n) + (S y^2 /m)
) 2
( (S^2 x/n)^2 /n + (S y^2 /m)^2 /m
This theorem allows us to develop approximate confidence procedures and tests for the difference in population means when the X and Y distributions have distinct unknown variances.
To apply the formula for df, you would round the expression on the right above to the closest whole number. The computed df satisfies the following inequality:
min(n, m) − 1 ≤ df ≤ n + m − 2.
A quick by-hand method is to use the lower bound for df instead of Welch’s formula.
Approximate confidence intervals for μx −μy. If the values of σ^2 x and σ^2 y are estimated from the data, then the theorem above can be used to demonstrate that
( X − Y
) ± tdf (α/2)
√ S x^2 n
S y^2 m
is an approximate 100(1 − α)% confidence interval for μx − μy. In this formula, the cutoff tdf (α/2) is the 100(1 − α/2)% point on the Student t distribution with degrees of freedom computed using Welch’s formula.
Approximate tests of μx − μy = δo. If the values of σ x^2 and σ^2 y are estimated from the data, then the approximate standardization when μx − μy = δo
(X − Y ) − δ 0 √ S^2 x n +^
S y^2 m
can be used as test statistic. The following table gives the rejection regions for approximate one sided and two sided 100α% tests:
Alternative Hypothesis Rejection Region
μx − μy < δo T ≤ −tdf (α) μx − μy > δo T ≥ tdf (α) μx − μy 6 = δo |T | ≥ tdf (α/2)
where tdf (p) is the 100(1 − p)% point of the Student t distribution with degrees of freedom computed using Welch’s formula.
Exercise 6 (Stukel, 1998, FTP: lib.stat.cmu.edu/datasets/) Several studies have suggested that low levels of plasma retinol (vitamin A) are associated with increased risk of certain types of cancer.
As part of a study to investigate the relationship between personal characteristics and cancer incidence, data were gathered on 315 subjects. This exercise compares mean plasma levels of retinol in nanograms per milliliter (ng/ml) for 35 women and 35 men who participated in the study. Data summaries are as follows:
Women: n = 35, x = 600. 943 , sx = 157. 103
Men: m = 35, y = 673. 457 , sy = 267. 37
Side-by-side box plots of the samples are shown on the left below. An enhanced normal probability plot of the 70 standardized plasma retinol levels is shown on the right below.
Women Men
200
400
600
800
1000
1200
p.ret.
-2 -1 0 1 2 exp
0
2
4
obs
Assume the information above summarizes independent random samples from normal dis- tributions.
10.1.4 Some Mathematica commands
The commands MeanDifferenceCI and MeanDifferenceTest can be used to answer ques- tions about the difference in means when variances are estimated.
For example, the following commands initialize samples from the X distribution (xvals) and the Y distribution (yvals), and return the 90% confidence interval for μx − μy using pooled t methods: { 0. 885916 , 4. 51992 }.
xvals={ 14. 87 , 13. 92 , 11. 01 , 16. 62 , 12. 83 , 11. 67 }; yvals={ 11. 85 , 10. 51 , 9. 57 , 11. 81 , 11. 11 , 11. 95 , 12. 34 , 7. 13 }; MeanDifferenceCI[xvals,yvals,ConfidenceLevel→0.90,EqualVariances→True]
Similarly, the following command returns the p value for a two sided test of μx − μy = 1 versus μx − μy 6 = 1 using pooled t methods: TwoSidedPValue→0.120699.
MeanDifferenceTest[xvals,yvals,1,TwoSided→True,EqualVariances→True]
If the EqualVariances→True option is omitted, then Welch t methods are used instead of pooled t methods.
Let X 1 , X 2 ,... , Xn and Y 1 , Y 2 ,... , Ym be independent random samples, of sizes n and m, from normal distributions with parameters
μx = E(X), σx = SD(X), μy = E(Y ), σy = SD(Y ).
The ratio of sample variances, S^2 x/S y^2 , is used to answer statistical questions about the ratio of model variances σ x^2 /σ^2 y when the means are estimated from the data.
Recall that the statistic F =
S x^2 /S y^2 σ^2 x/σ^2 y has an f ratio distribution with n − 1 and m − 1 df.
Tests based on the F statistic are approximate generalized likelihood ratio tests.
Confidence intervals for σ^2 x/σ^2 y. If μx and μy are estimated from the data, then
[ S^2 x/S y^2 fn− 1 ,m− 1 (α/2)
S^2 x/S y^2 fn− 1 ,m− 1 (1 − α/2)
]
is a 100(1 − α)% confidence interval for σ^2 x/σ^2 y , where fn− 1 ,m− 1 (p) is the 100(1 − p)% point of the f ratio distribution with (n − 1) and (m − 1) degrees of freedom.
Hypothesis tests of σ^2 x/σ^2 y = ro. If μx and μy are estimated from the data, then the ratio when σ^2 x/σ^2 y = ro
F =
S x^2 /S^2 y ro
can be used as test statistic. The following table gives the rejection regions for one sided and two sided 100α% tests:
Alternative Hypothesis Rejection Region
σ x^2 /σ^2 y < ro F ≤ fn− 1 ,m− 1 (1 − α) σ x^2 /σ^2 y > ro F ≥ fn− 1 ,m− 1 (α) σ x^2 /σ^2 y 6 = ro F ≤ fn− 1 ,m− 1 (1 − α/2) or F ≥ fn− 1 ,m− 1 (α/2)
where fn− 1 ,m− 1 (p) is the 100(1 − p)% point of the f ratio distribution with (n − 1) and (m − 1) degrees of freedom.
These tests are examples of f tests. An f test is a test based on a statistic with an f ratio distribution under the null hypothesis.
Example 7 F ratio methods are often used as a first step in an analysis of the difference in means. For example, in the alpha waves exercise, a confidence interval for the difference in means was constructed under the assumption that the population variances were equal.
To demonstrate that this assumption is justified, we test
σ^2 x/σ^2 y = 1 versus σ x^2 /σ y^2 6 = 1
at the 5% significance level.
The rejection region for the test is
F ≤ f 9 , 9 (0.975) = 1/ 4 .03 = 0. 25 or F ≥ f 9 , 9 (0.025) = 4. 03
and the observed value of the test statistic is s^2 x/s^2 y = 0.5897.
Since the observed value of the test statistic is in the acceptance region, the hypothesis of equal variances is accepted.
Exercise 8 Assume the following information summarizes the values of independent ran- dom samples from normal distributions:
n = 16, x = 78. 05 , sx = 8. 56 , m = 13, y = 69. 13 , sy = 4. 33.
Construct a 95% confidence interval for the ratio of the population variances, σ^2 x/σ^2 y.
10.2.1 Some Mathematica commands
The commands VarianceRatioCI and VarianceRatioTest can be used to answer questions about the ratio of variances when means are estimated.
For example, the following commands initialize samples from the X distribution (xvals) and the Y distribution (yvals), and return the 90% confidence interval for σ^2 x/σ^2 y based on these data: { 3. 47127 , 87. 1637 }.
xvals={ 17. 48 , 12. 09 , 15. 06 , 16. 33 , 7. 62 , 14. 52 , 13. 72 , 13. 46 }; yvals={ 11. 45 , 11. 07 , 10. 92 , 12. 55 , 11. 82 }; VarianceRatioCI[xvals,yvals,ConfidenceLevel→0.90]
Similarly, the following command returns the p value for a two sided test of equality of variances (ratio of variances equals 1): TwoSidedPValue→0.0104248.
VarianceRatioTest[xvals,yvals,1,TwoSided→True]
There are several approaches for working with data generated from nonnormal distributions. In particular, researchers will
Since (i) distribution-specific procedures are often difficult to derive, (ii) normal theory methods are well-known, and (iii) transformations are easily done on the computer, many researchers work with transformed data.
Procedures applicable to a broad range of distributions are often called distribution-free or nonparametric procedures.
Normal theory methods applied to transformed data are often optimal on the transformed scale, but are not guaranteed to be optimal on the original scale.
Distribution-free methods, while not optimal in any situation, are often close to optimal.
10.3.1 Transformations
One popular approach for comparing samples from nonnormal distributions is to transform the data to approximate normality, and to use normal theory methods on the transformed data.
For example, the left plot below shows side-by-side box plots of samples taken from skewed positive distributions, and the right plot shows an enhanced normal probability plot of standardized values.
Sample 1 Sample 2
100
200
300
400
500
value
-4 -2 -1 0 1 2 exp
0
2
4
obs
Notice that the boxes are asymmetric, there are large outliers, and the normal probability plot has a pronounced curve.
By contrast, plots based on a log transformation of the data suggest that normal theory methods could be applied to the log-transformed data.
Sample 1 Sample 2
3
4
5
6
log-value
-2 -1 0 1 2 exp
0
2
4
obs
Although the use of transformations is attractive, there are some drawbacks. For example, it may be difficult to
10.3.2 Nonparametric or distribution-free methods; parametric methods
Another popular approach for comparing samples from nonnormal distributions is to use distribution-free, or nonparametric, methods.
Statistical methods that require strong assumptions about the shapes of distributions (for example, uniform or exponential), and ask questions about parameter values are called parametric methods.
By contrast, nonparametric or distribution-free methods make mild assumptions, such as, “the distributions are continuous” or “the continuous distributions are symmetric around their centers.” The statistics used in nonparametric procedures are often related to order statistics, as we will see in the next section.
The quantile confidence interval method from the previous chapter of notes is an example of a nonparametric method. The method is valid for any continuous distribution with pth^ quantile θ.
Assume that X 1 , X 2 ,... , Xn, and Y 1 , Y 2 ,... , Ym
are independent random samples, of sizes n and m, from continuous distributions.
In the 1940’s, Wilcoxon and Mann-Whitney developed equivalent nonparametric methods for testing the null hypothesis that the X and Y distributions are equal versus alternatives that one distribution is stochastically larger than the other. In some situations, confidence procedures for the difference in population medians can be developed.
10.4.1 Stochastically larger; stochastically smaller
Let V and W be continuous random variables. V is stochastically larger than W (corre- sponding, W is stochastically smaller than V ) if
P (V ≥ x) ≥ P (W ≥ x) for all real numbers x,
with strict inequality (P (V ≥ x) > P (W ≥ x)) for at least one x.
The definition is illustrated in the following plots of PDFs (left) and CDFs (right), where the V distribution is shown in gray and the W distribution in black.
0 x
Density
0 x
1
Cumulative
Note, in particular, that if Fv and Fw are the CDFs of V and W , respectively, then
Fv (x) ≤ Fw(x) for all real numbers x.
10.4.2 Wilcoxon rank sum statistic; distribution of statistic
The Wilcoxon rank sum statistics for the X sample (R 1 ) and for the Y sample (R 2 ) are computed as follows:
Example 9 If n = 4, m = 6 and the data are as follows
then the sorted combined list of n + m = 10 observations is
The observed value of R 1 is
The observed value of R 2 is
Example 10 If n = 9, m = 5 and the data are as follows
then the sorted combined list of n + m = 14 observations is
The observed value of R 1 is
The observed value of R 2 is
Recall from calculus that the sum of the first N positive integers is N^ (N 2 +1).
Thus, R 1 + R 2 =
(n + m)(n + m + 1) 2
,
and statistical tests can use either R 1 or R 2. We will use the R 1 statistic.
Theorem 11 Assume that the X and Y distributions are equal and let R 1 be the Wilcoxon rank sum statistic for the first sample. Then
P (R 1 = x) = P (R 1 = n(n + m + 1) − x).
The proof of this theorem uses the following fact: If the X and Y distributions are equal, then each ordering of the n + m random variables is equally likely.
This fact implies that each choice of the n ranks used to compute R 1 is equally likely, and that counting methods can be used to find the distribution of R 1.
Exercise 12 Let n = 2 and m = 4.
a. List all
( 6 2
) = 15 subsets of size 2 from { 1 , 2 , 3 , 4 , 5 , 6 }.
b. Use your answer to part a to completely specify the PDF of R 1.
Using the distribution of R 1 under the null hypothesis,
i. Large values of R 1 support the alternative hypothesis that X is stochastically larger than Y. For this alternative, the observed significance level (p value) is
P (R 1 ≥ robs)
where robs is the observed value of the statistic.
ii. Small values of R 1 support the alternative hypothesis that X is stochastically smaller than Y. For this alternative, the observed significance level (p value) is
P (R 1 ≤ robs)
where robs is the observed value of the statistic.
Further, the p value for a two-sided test is twice the p value for a one-sided test.
Example 13 Let n = 9 and m = 5. The following table gives the exact distribution of R 1 under the null hypothesis. (Each choice of 9 ranks for the first sample is equally likely. There are a total of
( 14 9
) = 2002 choices.)
x P (R 1 = x) P (R 1 ≤ x) x P (R 1 = x) P (R 1 ≤ x) x P (R 1 = x) P (R 1 ≤ x) 45 0. 0005 0. 0005 61 0. 0370 0. 2188 77 0. 0250 0. 9051 46 0. 0005 0. 0010 62 0. 0405 0. 2592 78 0. 0215 0. 9266 47 0. 0010 0. 0020 63 0. 0440 0. 3032 79 0. 0175 0. 9441 48 0. 0015 0. 0035 64 0. 0465 0. 3497 80 0. 0145 0. 9585 49 0. 0025 0. 0060 65 0. 0490 0. 3986 81 0. 0115 0. 9700 50 0. 0035 0. 0095 66 0. 0504 0. 4491 82 0. 0090 0. 9790 51 0. 0050 0. 0145 67 0. 0509 0. 5000 83 0. 0065 0. 9855 52 0. 0065 0. 0210 68 0. 0509 0. 5509 84 0. 0050 0. 9905 53 0. 0090 0. 0300 69 0. 0504 0. 6014 85 0. 0035 0. 9940 54 0. 0115 0. 0415 70 0. 0490 0. 6503 86 0. 0025 0. 9965 55 0. 0145 0. 0559 71 0. 0465 0. 6968 87 0. 0015 0. 9980 56 0. 0175 0. 0734 72 0. 0440 0. 7408 88 0. 0010 0. 9990 57 0. 0215 0. 0949 73 0. 0405 0. 7812 89 0. 0005 0. 9995 58 0. 0250 0. 1199 74 0. 0370 0. 8182 90 0. 0005 1. 0000 59 0. 0290 0. 1489 75 0. 0330 0. 8511 60 0. 0330 0. 1818 76 0. 0290 0. 8801
Values of R 1 range from n(n 2 +1) = 45 to nm + n(n 2 +1) = 90. Summaries are as follows:
n(n + m + 1) 2
= 67. 5 and V ar(R 1 ) =
nm(n + m + 1) 12
the observed significance level is.
the observed significance level is.
the observed significance level is.
Example 14 (Rice, Duxbury Press, 1995, p. 396). “An experiment was performed to determine whether two forms of iron (Fe2+ and Fe3+) are retained differently. (If one form of iron were retained especially well, it would be the better dietary supplement.) The investigators divided 108 mice randomly into 6 groups of 18 each; three groups were given Fe2+ in three different concentrations, 10.2, 1.2, and 0.3 millimolar, and three groups were given Fe3+ at the same concentrations. The mice were given the iron orally; the iron was radioactively labeled so that a counter could be used to measure the initial amount given. At a later time, another count was taken for each mouse, and the percentage of iron retained was calculated.”
Results for the second concentration (1.2 millimolar) are reported below.
(1) X sample: 18 observations (percent retention) for mice given Fe2+.
(2) Y sample: 18 observations (percent retention) for mice given Fe3+.
The left plot below shows side-by-side box plots of the percent retention for each group, and the right plot is an enhanced normal probability plot of the 36 standardized values.
Fe2 Fe3
5
10
15
20
25
%Ret.
-2 -1 0 1 2 exp
-3
-2
-1
0
1
2
3
4
obs
These plots suggest that the X and Y distributions are not normal.
The equality of distributions will be tested using the Wilcoxon rank sum test, a two-sided alternative, and 5% significance level.
The graph below shows the exact distribution of R 1 under the null hypothesis of equality of distributions. (Each choice of 18 out of 36 ranks is equally likely.( There are a total of 36 18
) = 9,075,135,300 choices.)
200 250 300 350 400 450
0 r
0.002
0.004
0.006
0.008
0.01
0.012
Probability
The observed value of R 1 is 362, and the p value is
2 P (R 1 ≥ 362) = 0. 371707.
Thus, using the 5% significance level, the iron retention distributions (at the 1.2 millimolar concentration) are not significantly different.