Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Confidence Intervals & Hypothesis Tests for Mean Difference in Two Normal Distributions - , Study notes of Mathematical Statistics

The construction of confidence intervals and hypothesis tests for the difference in means of two normal distributions with known and unknown variances. It covers pooled t methods, welch t methods, and nonparametric methods for comparing samples from nonnormal distributions. The document also introduces the shift model and the hodges-lehmann estimator for estimating the shift parameter.

Typology: Study notes

Pre 2010

Uploaded on 08/31/2009

koofers-user-wce
koofers-user-wce 🇺🇸

10 documents

1 / 37

Toggle sidebar

Related documents


Partial preview of the text

Download Confidence Intervals & Hypothesis Tests for Mean Difference in Two Normal Distributions - and more Study notes Mathematical Statistics in PDF only on Docsity! Mathematical Statistics I Notes 5 prepared by Professor Jenny Baglivo c© Copyright 2004 by Jenny A. Baglivo. All Rights Reserved. 10 Two sample analysis 136 10.1 Normal distributions: difference in means . . . . . . . . . . . . . . . . . . . . . . . . 136 10.1.1 Known variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 10.1.2 Pooled t methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 10.1.3 Welch t methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 10.1.4 Some Mathematica commands . . . . . . . . . . . . . . . . . . . . . . . . . . 148 10.2 Normal distributions: ratio of variances . . . . . . . . . . . . . . . . . . . . . . . . . 148 10.2.1 Some Mathematica commands . . . . . . . . . . . . . . . . . . . . . . . . . . 151 10.3 Nonnormal distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 10.3.1 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 10.3.2 Nonparametric or distribution-free methods; parametric methods . . . . . . . 153 10.4 Rank sum test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 10.4.1 Stochastically larger; stochastically smaller . . . . . . . . . . . . . . . . . . . 153 10.4.2 Wilcoxon rank sum statistic; distribution of statistic . . . . . . . . . . . . . . 154 10.4.3 Tied observations; midranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 10.4.4 Mann-Whitney U statistic; distribution of statistic . . . . . . . . . . . . . . . 162 10.4.5 Shift model; shift parameter; Walsh differences; HL estimator . . . . . . . . . 164 10.4.6 Confidence interval procedure for shift parameter . . . . . . . . . . . . . . . . 166 10.4.7 Some Mathematica commands . . . . . . . . . . . . . . . . . . . . . . . . . . 169 10.5 Sampling models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 10.5.1 Population model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 10.5.2 Randomization model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 135 10 Two sample analysis In many statistical applications, interest focuses on comparing two probability distributions. For example, an education researcher might be interested in determining if the distributions of standardized test scores for students in public and private schools are equal, or a medical researcher might be interested in determining if mean blood pressure levels are the same in patients on two different treatment protocols. This chapter considers statistical methods for comparing independent random samples from two continuous distributions. 10.1 Normal distributions: difference in means Let X1, X2, . . . , Xn and Y1, Y2, . . . , Ym be independent random samples, of sizes n and m, from normal distributions with parameters µx = E(X), σx = SD(X), µy = E(Y ), σy = SD(Y ). This section focuses on answering statistical questions about the difference in means, µx−µy. The difference in sample means, X − Y , can be used to estimate the difference in means. The statistic X − Y is a normal random variable with E ( X − Y ) = µx − µy and V ar ( X − Y ) = σ2x n + σ2y m . 10.1.1 Known variances Assume that σx and σy are known. Confidence intervals for µx − µy. If σx and σy are known, then ( X − Y ) ± z(α/2) √ σ2x n + σ2y m is a 100(1 − α)% confidence interval for µx − µy, where z(α/2) is the 100(1 − α/2)% point of the standard normal distribution. The demonstration of this procedure uses the fact that the standardized difference in sample means (Z) is a standard normal random variable. 136 10.1.2 Pooled t methods Pooled t methods are used when X and Y have a common unknown variance: σ2 = σ2x = σ 2 y . Pooling information. Let S2x and S 2 y be the sample variances of the X and Y samples, respectively, and let S2p = (n − 1)S2x + (m − 1)S 2 y n + m − 2 . Then the statistic S2p is the pooled estimate of σ 2. The statistic S2p is a weighted average of the separate estimates of σ 2. If n = m, then the weights are equal; otherwise, the estimate based on the larger sample is given the larger weight. Note that (1) S2p is an unbiased estimator of σ 2, and (2) The statistic V = (n + m − 2)S2p σ2 has a chi-square distribution with (n + m − 2) degrees of freedom. The first property follows from properties of expectation: E(S2p) = 1 n+m−2 ( (n − 1)E(S2x) + (m − 1)E(S 2 y) ) = 1 n+m−2 ( (n − 1)σ2 + (m − 1)σ2 ) = σ2. To prove the second property, note first that V = (n + m − 2)S2p σ2 = (n − 1)S2x σ2 + (m − 1)S2y σ2 is a sum of independent chi-square random variables with (n − 1) and (m − 1) df. The result follows from the fact that the sum of independent chi-square random variables has a chi-square distribution with df equal to the sum of the separate degrees of freedom. 139 Theorem 2 (Approximate Standardization) Under the assumptions of this section, the statistic T = (X − Y ) − (µx − µy) √ S2p ( 1 n + 1 m ) has a Student t distribution with (n + m − 2) degrees of freedom. This theorem allows us to develop confidence procedures and tests for the difference in population means when the X and Y distributions have a common unknown variance. Tests based on the pooled t statistic are generalized likelihood ratio tests. Confidence intervals for µx − µy. If the value of σ 2 = σ2x = σ 2 y is estimated from the data, then the theorem above can be used to demonstrate that ( X − Y ) ± tn+m−2(α/2) √ S2p ( 1 n + 1 m ) is a 100(1 − α)% confidence interval for µx − µy, where tn+m−2(α/2) is the 100(1 − α/2)% point on the Student t distribution with (n + m − 2) df. Hypothesis tests of µx − µy = δo. If the value of σ 2 = σ2x = σ 2 y is estimated from the data, then the approximate standardization when µx − µy = δo T = (X − Y ) − δ0 √ S2p ( 1 n + 1 m ) can be used as test statistic. The following table gives the rejection regions for one sided and two sided 100α% tests: Alternative Hypothesis Rejection Region µx − µy < δo T ≤ −tn+m−2(α) µx − µy > δo T ≥ tn+m−2(α) µx − µy 6= δo |T | ≥ tn+m−2(α/2) where tn+m−2(p) is the 100(1− p)% point of the Student t distribution with (n+m− 2) df. 140 Exercise 3 (Shoemaker, JSE, 1996, 4(2), 14 paragraphs ) Normal body temperatures of 148 subjects were taken several times over two consecutive days. A total of 130 values are reported below. (1) X sample: 65 temperatures (in degrees Fahrenheit) for women 96.4 96.7 96.8 97.2 97.2 97.4 97.6 97.7 97.7 97.8 97.8 97.8 97.9 97.9 97.9 98.0 98.0 98.0 98.0 98.0 98.1 98.2 98.2 98.2 98.2 98.2 98.2 98.3 98.3 98.3 98.4 98.4 98.4 98.4 98.4 98.5 98.6 98.6 98.6 98.6 98.7 98.7 98.7 98.7 98.7 98.7 98.8 98.8 98.8 98.8 98.8 98.8 98.8 98.9 99.0 99.0 99.1 99.1 99.2 99.2 99.3 99.4 99.9 100.0 100.8 Sample summaries: n = 65, x = 98.3938, sx = 0.7435 (2) Y sample: 65 temperatures (in degrees Fahrenheit) for men 96.3 96.7 96.9 97.0 97.1 97.1 97.1 97.2 97.3 97.4 97.4 97.4 97.4 97.5 97.5 97.6 97.6 97.6 97.7 97.8 97.8 97.8 97.8 97.9 97.9 98.0 98.0 98.0 98.0 98.0 98.0 98.1 98.1 98.2 98.2 98.2 98.2 98.3 98.3 98.4 98.4 98.4 98.4 98.5 98.5 98.6 98.6 98.6 98.6 98.6 98.6 98.7 98.7 98.8 98.8 98.8 98.9 99.0 99.0 99.0 99.1 99.2 99.3 99.4 99.5 Sample summaries: m = 65, y = 98.1046, sy = 0.6988 Women Men 97 98 99 100 101 temp -2 -1 0 1 2 exp -4 -2 0 2 4 obs • The left plot shows side-by-side box plots of the two samples. The sample distributions are approximately symmetric. • The right plot is a normal probability plot of standardized temperatures: Each x is replaced by (x − x)/sx and each y is replaced by (y − y)/sy; the 130 ordered standardized values (vertical axis; observed) are plotted against the k/131st quantiles of the standard normal distribution (horizontal axis; expected). The normal probability plot is enhanced by including the results of 100 simulations from the standard normal distribution. The plot suggests that normal theory methods are reasonable (but says nothing about whether or not the population variances are equal). 141 Assume these data are the values of independent random samples from normal distributions with a common variance. • Test µx = µy against µx 6= µy at the 5% level. • Construct a 95% confidence interval for µx − µy. • Comment on the analyses. 144 10.1.3 Welch t methods Welch t methods are used when X and Y have distinct unknown variances. The following theorem, proven by B. Welch in the 1930’s, says that the approximate standardization of the difference in sample means has an approximate Student t distribution. Note that the statistic (S2x/n)+(S 2 y/m) is an unbiased estimator of the difference in sample means, and is used in the denominator of the approximate standardization of X − Y . Theorem 5 (Welch’s Theorem) Under the assumptions of this section, the statistic T = (X − Y ) − (µx − µy) √ S2x n + S2 y m has an approximate Student t distribution with degrees of freedom as follows: df = ( (S2x/n) + (S 2 y/m) )2 ( (S2x/n) 2/n + (S2y/m) 2/m ) − 2 . This theorem allows us to develop approximate confidence procedures and tests for the difference in population means when the X and Y distributions have distinct unknown variances. To apply the formula for df, you would round the expression on the right above to the closest whole number. The computed df satisfies the following inequality: min(n, m) − 1 ≤ df ≤ n + m − 2. A quick by-hand method is to use the lower bound for df instead of Welch’s formula. Approximate confidence intervals for µx−µy. If the values of σ 2 x and σ 2 y are estimated from the data, then the theorem above can be used to demonstrate that ( X − Y ) ± tdf (α/2) √ S2x n + S2y m is an approximate 100(1 − α)% confidence interval for µx − µy. In this formula, the cutoff tdf (α/2) is the 100(1 − α/2)% point on the Student t distribution with degrees of freedom computed using Welch’s formula. 145 Approximate tests of µx − µy = δo. If the values of σ 2 x and σ 2 y are estimated from the data, then the approximate standardization when µx − µy = δo T = (X − Y ) − δ0 √ S2x n + S2 y m can be used as test statistic. The following table gives the rejection regions for approximate one sided and two sided 100α% tests: Alternative Hypothesis Rejection Region µx − µy < δo T ≤ −tdf (α) µx − µy > δo T ≥ tdf (α) µx − µy 6= δo |T | ≥ tdf (α/2) where tdf (p) is the 100(1 − p)% point of the Student t distribution with degrees of freedom computed using Welch’s formula. Exercise 6 (Stukel, 1998, FTP: lib.stat.cmu.edu/datasets/) Several studies have suggested that low levels of plasma retinol (vitamin A) are associated with increased risk of certain types of cancer. As part of a study to investigate the relationship between personal characteristics and cancer incidence, data were gathered on 315 subjects. This exercise compares mean plasma levels of retinol in nanograms per milliliter (ng/ml) for 35 women and 35 men who participated in the study. Data summaries are as follows: Women: n = 35, x = 600.943, sx = 157.103 Men: m = 35, y = 673.457, sy = 267.37 Side-by-side box plots of the samples are shown on the left below. An enhanced normal probability plot of the 70 standardized plasma retinol levels is shown on the right below. Women Men 200 400 600 800 1000 1200 p.ret. -2 -1 0 1 2 exp -4 -2 0 2 4 obs 146 Confidence intervals for σ2x/σ 2 y. If µx and µy are estimated from the data, then [ S2x/S 2 y fn−1,m−1(α/2) , S2x/S 2 y fn−1,m−1(1 − α/2) ] is a 100(1 −α)% confidence interval for σ2x/σ 2 y , where fn−1,m−1(p) is the 100(1 − p)% point of the f ratio distribution with (n − 1) and (m − 1) degrees of freedom. Hypothesis tests of σ2x/σ 2 y = ro. If µx and µy are estimated from the data, then the ratio when σ2x/σ 2 y = ro F = S2x/S 2 y ro can be used as test statistic. The following table gives the rejection regions for one sided and two sided 100α% tests: Alternative Hypothesis Rejection Region σ2x/σ 2 y < ro F ≤ fn−1,m−1(1 − α) σ2x/σ 2 y > ro F ≥ fn−1,m−1(α) σ2x/σ 2 y 6= ro F ≤ fn−1,m−1(1 − α/2) or F ≥ fn−1,m−1(α/2) where fn−1,m−1(p) is the 100(1 − p)% point of the f ratio distribution with (n − 1) and (m − 1) degrees of freedom. These tests are examples of f tests. An f test is a test based on a statistic with an f ratio distribution under the null hypothesis. Example 7 F ratio methods are often used as a first step in an analysis of the difference in means. For example, in the alpha waves exercise, a confidence interval for the difference in means was constructed under the assumption that the population variances were equal. To demonstrate that this assumption is justified, we test σ2x/σ 2 y = 1 versus σ 2 x/σ 2 y 6= 1 at the 5% significance level. The rejection region for the test is F ≤ f9,9(0.975) = 1/4.03 = 0.25 or F ≥ f9,9(0.025) = 4.03 and the observed value of the test statistic is s2x/s 2 y = 0.5897. Since the observed value of the test statistic is in the acceptance region, the hypothesis of equal variances is accepted. 149 Exercise 8 Assume the following information summarizes the values of independent ran- dom samples from normal distributions: n = 16, x = 78.05, sx = 8.56, m = 13, y = 69.13, sy = 4.33. Construct a 95% confidence interval for the ratio of the population variances, σ2x/σ 2 y . 150 10.2.1 Some Mathematica commands The commands VarianceRatioCI and VarianceRatioTest can be used to answer questions about the ratio of variances when means are estimated. For example, the following commands initialize samples from the X distribution (xvals) and the Y distribution (yvals), and return the 90% confidence interval for σ2x/σ 2 y based on these data: {3.47127, 87.1637}. xvals={17.48, 12.09, 15.06, 16.33, 7.62, 14.52, 13.72, 13.46}; yvals={11.45, 11.07, 10.92, 12.55, 11.82}; VarianceRatioCI[xvals,yvals,ConfidenceLevel→0.90] Similarly, the following command returns the p value for a two sided test of equality of variances (ratio of variances equals 1): TwoSidedPValue→0.0104248. VarianceRatioTest[xvals,yvals,1,TwoSided→True] 10.3 Nonnormal distributions There are several approaches for working with data generated from nonnormal distributions. In particular, researchers will 1. Use procedures that are applicable to specific distributions (for example, to uniform distributions or exponential distributions). 2. Transform their data to approximate normality and use normal theory methods on the transformed data. 3. Use procedures that are applicable to a broad range of distributions. Since (i) distribution-specific procedures are often difficult to derive, (ii) normal theory methods are well-known, and (iii) transformations are easily done on the computer, many researchers work with transformed data. Procedures applicable to a broad range of distributions are often called distribution-free or nonparametric procedures. Normal theory methods applied to transformed data are often optimal on the transformed scale, but are not guaranteed to be optimal on the original scale. Distribution-free methods, while not optimal in any situation, are often close to optimal. 151 The definition is illustrated in the following plots of PDFs (left) and CDFs (right), where the V distribution is shown in gray and the W distribution in black. x0 0.02 0.04 0.06 0.08 Density x0 0.2 0.4 0.6 0.8 1 Cumulative Note, in particular, that if Fv and Fw are the CDFs of V and W , respectively, then Fv(x) ≤ Fw(x) for all real numbers x. 10.4.2 Wilcoxon rank sum statistic; distribution of statistic The Wilcoxon rank sum statistics for the X sample (R1) and for the Y sample (R2) are computed as follows: 1. Pool and sort the n + m observations. 2. Replace each observation by its rank (or position) in the sorted list. 3. Let R1 equal the sum of the ranks for observations in the X sample, and R2 equal the sum of the ranks for observations in the Y sample. Example 9 If n = 4, m = 6 and the data are as follows 1.1, 2.5, 3.2, 4.1 and 2.8, 3.6, 4.0, 5.2, 5.8, 7.2 then the sorted combined list of n + m = 10 observations is 1.1, 2.5, 2.8, 3.2, 3.6, 4.0, 4.1, 5.2, 5.8, 7.2. The observed value of R1 is The observed value of R2 is 154 Example 10 If n = 9, m = 5 and the data are as follows 12.8, 15.6, 15.7, 17.3, 18.5, 22.9, 27.5, 29.7, 35.1 and 8.2, 12.6, 16.7, 21.6, 32.4 then the sorted combined list of n + m = 14 observations is 8.2, 12.6, 12.8, 15.6, 15.7, 16.7, 17.3, 18.5, 21.6, 22.9, 27.5, 29.7, 32.4, 35.1. The observed value of R1 is The observed value of R2 is Recall from calculus that the sum of the first N positive integers is N(N+1)2 . Thus, R1 + R2 = (n + m)(n + m + 1) 2 , and statistical tests can use either R1 or R2. We will use the R1 statistic. Theorem 11 Assume that the X and Y distributions are equal and let R1 be the Wilcoxon rank sum statistic for the first sample. Then 1. The range of R1 is n(n+1) 2 , n(n+1) 2 + 1, . . ., nm + n(n+1) 2 . 2. E(R1) = n(n+m+1) 2 and V ar(R1) = nm(n+m+1) 12 . 3. The distribution of R1 is symmetric around its mean. In particular, P (R1 = x) = P (R1 = n(n + m + 1) − x) . 4. If n and m are large, then the distribution of R1 is approximately normal. (If both are greater than 20, then the approximation is reasonably good.) The proof of this theorem uses the following fact: If the X and Y distributions are equal, then each ordering of the n + m random variables is equally likely. This fact implies that each choice of the n ranks used to compute R1 is equally likely, and that counting methods can be used to find the distribution of R1. 155 Exercise 12 Let n = 2 and m = 4. a. List all (6 2 ) = 15 subsets of size 2 from {1, 2, 3, 4, 5, 6}. b. Use your answer to part a to completely specify the PDF of R1. 156 The left plot below shows side-by-side box plots of the percent retention for each group, and the right plot is an enhanced normal probability plot of the 36 standardized values. Fe2 Fe3 5 10 15 20 25 %Ret. -2 -1 0 1 2 exp -3 -2 -1 0 1 2 3 4 obs These plots suggest that the X and Y distributions are not normal. The equality of distributions will be tested using the Wilcoxon rank sum test, a two-sided alternative, and 5% significance level. The graph below shows the exact distribution of R1 under the null hypothesis of equality of distributions. (Each choice of 18 out of 36 ranks is equally likely. There are a total of (36 18 ) = 9,075,135,300 choices.) 200 250 300 350 400 450 r0 0.002 0.004 0.006 0.008 0.01 0.012 Probability The observed value of R1 is 362, and the p value is 2P (R1 ≥ 362) = 0.371707. Thus, using the 5% significance level, the iron retention distributions (at the 1.2 millimolar concentration) are not significantly different. 159 10.4.3 Tied observations; midranks Continuous data are often rounded to a fixed number of decimal places, causing two or more observations to be equal. Equal observations are said to be tied at a given value. If two or more observations are tied at a given value, then their average rank (or midrank) is used in computing the rank sum statistic. The method is illustrated in the following example. Example 15 (Rice, Duxbury Press, 1995, p. 390) “Two methods, A and B, were used in a determination of the latent heat of fusion of ice (Natrella 1963). The investigators wished to find out by how much the methods differed. The following table gives the change in total heat from ice at −0.72oC to water 0oC in calories per gram of mass.” (1) X sample: 13 observations (calories/gram) using Method A: 79.97 79.98 80.00 80.02 80.02 80.02 80.03 80.03 80.03 80.04 80.04 80.04 80.05 (2) Y sample: 8 observations (calories/gram) using Method B: 79.94 79.95 79.97 79.97 79.97 79.98 80.02 80.03 The following table shows the ordered values and corresponding midranks: Observation Midrank Observation Midrank 1 79.94 1.0 12 80.02 11.5 2 79.95 2.0 13 80.02 11.5 3 79.97 4.5 14 80.03 15.5 4 79.97 4.5 15 80.03 15.5 5 79.97 4.5 16 80.03 15.5 6 79.97 4.5 17 80.03 15.5 7 79.98 7.5 18 80.04 19.0 8 79.98 7.5 19 80.04 19.0 9 80.00 9.0 20 80.04 19.0 10 80.02 11.5 21 80.05 21.0 11 80.02 11.5 The observed value of R1 is The observed value of R2 is 160 The equality of distributions will be tested using the Wilcoxon rank sum test, a two-sided alternative, and 5% significance level. The graph below gives the exact distribution of R1 under the null hypothesis. 100 120 140 160 180 r0 0.0025 0.005 0.0075 0.01 0.0125 0.015 0.0175 Probability The observed value of R1 is , and the p value is 2P (R1 ≥ ) = 0.00522876. Thus, using the 5% significance level, there is a significant difference in the two methods of measurement. In fact, there is evidence that the values produced by Method A are generally higher than those produced by Method B. To compute the exact distribution for the fusion data: Imagine writing the 21 midranks on 21 slips of paper and placing the slips in an urn. A subset of 13 slips is chosen from the urn and the sum of the midranks is recorded. If each choice of subset is equally likely, then the graph above gives the distribution of the sum. There are ( 21 13 ) = 203,490 choices. There are striking differences in the graphs of the rank sum statistic in the last two examples: • In the iron retention example, n = m = 18 and there are no ties in the data. The distribution of R1 is approximately normal. • In the fusion example, n = 13, m = 8, and there are many ties in the data. The distribution of R1 is far from normal. 161 10.4.5 Shift model; shift parameter; Walsh differences; HL estimator The random variables X and Y are said to satisfy a shift model if X − ∆ and Y have the same distribution, where ∆ is the difference in medians: ∆ = Median(X) − Median(Y ). The parameter ∆ is called the shift parameter. Assume that X and Y satisfy a shift model and ∆ 6= 0. If ∆ > 0, then X is stochastically larger than Y ; otherwise, X is stochastically smaller than Y . Assume that X and Y have finite means, µx and µy. If X and Y satisfy a shift model with shift parameter ∆, then ∆ is also the difference in means: ∆ = µx − µy. For example, • If X is a normal random variable with mean 3 and standard deviation 4, and Y is a normal random variable with mean 8 and standard deviation 4, then X and Y satisfy a shift model with shift parameter ∆ = −5. (See the left plot below, where the distribution of X is in gray and the distribution of Y is in black.) • If X has a shifted exponential distribution with PDF f(x) = 1 10 e−( x−8 10 ) when x > 8 and 0 otherwise, and Y is an exponential random variable with parameter 1/10, then X and Y satisfy a shift model with ∆ = 8. (See the right plot below, where the distribution of X is in gray and the distribution of Y is in black.) -10 0 10 20 30 0 0.02 0.04 0.06 0.08 0.1 Density 0 10 20 30 40 50 0 0.02 0.04 0.06 0.08 0.1 Density 164 Treatment effects. If X and Y satisfy a shift model, then their distributions differ in location only. In studies comparing a treatment group to a no treatment group, where the effect of the treatment is additive, the shift parameter is referred to as the treatment effect. Estimating the shift parameter. If X and Y satisfy a shift model with shift parameter ∆, then the Hodges-Lehmann estimator (or HL estimator) of ∆ is the median of the list of nm differences Xi − Yj, where i = 1, 2, . . . , n, j = 1, 2, . . . ,m . The differences are often referred to as the Walsh differences. Exercise 19 Assume that n = 5, m = 7, and the data are as follows: 4.9, 7.3, 9.2, 11.0, 17.3 and 0.5, 0.7, 1.5, 2.7, 5.6, 8.7, 13.4. The following 5 × 7 table gives the Walsh differences: 0.5 0.7 1.5 2.7 5.6 8.7 13.4 4.9 4.4 4.2 3.4 2.2 −0.7 −3.8 −8.5 7.3 6.8 6.6 5.8 4.6 1.7 −1.4 −6.1 9.2 8.7 8.5 7.7 6.5 3.6 0.5 −4.2 11.0 10.5 10.3 9.5 8.3 5.4 2.3 −2.4 17.3 16.8 16.6 15.8 14.6 11.7 8.6 3.9 Assume these data are the values of independent random samples from continuous distri- butions satisfying a shift model, with ∆=Median(X)−Median(Y ). Find the HL estimate of the shift parameter, ∆. 165 10.4.6 Confidence interval procedure for shift parameter The ordered Walsh differences D(k), for k = 1, 2, . . . , nm, divide the real line into nm + 1 intervals (−∞, D(1)), (D(1), D(2)), . . . , (D(nm−1), D(nm)), (D(nm),∞) (ignoring the endpoints). The following theorem relates the probability that ∆ is in one of these intervals (or in a union of these intervals) to the null distribution of the Mann-Whitney U statistic for the first sample, U1. Theorem 20 (Shift confidence intervals) Under the assumptions above, if k is chosen so that the null probability P (U1 ≤ k − 1) = α 2 , then the interval [ D(k), D(nm−k+1) ] is a 100(1 − α)% confidence interval for the shift parameter, ∆. The procedure given in this theorem is an example of inverting an hypothesis test: A value δo is in a 100(1− α)% confidence interval if the two sided rank sum test of Ho : The distributions of X − δo and Y are equal is accepted at the α significance level. An outline of the proof of the theorem is as follows: (i) Since X and Y satisfy a shift model, the samples Sample 1: X1 − ∆, X2 − ∆, . . . , Xn − ∆ Sample 2: Y1, Y2, . . . , Ym are independent random samples from the same distribution. Thus, the distribution of U1 = #(Xi − ∆ > Yj) = #(Xi − Yj > ∆) can be tabulated assuming that each assignment of n values to the first sample is equally likely. (ii) The following statements are equivalent: • D(k) < ∆ < D(k+1). • Exactly k differences of the form Xi − Yj are less than ∆ and exactly nm − k differences of the form Xi − Yj are greater than ∆. • U1 = nm − k. 166 10.4.7 Some Mathematica commands The Ranks command can be used to return the ranks of X and Y samples. For example, the following commands initialize samples from the X distribution (xvals) and the Y distribution (yvals), and return the list {{1, 2, 3, 4, 6, 8}, {5, 7, 9, 10, 11, 12, 13, 14}}. xvals={3.1, 7.9, 10.4, 10.6, 13.1, 16.1}; yvals={12.7, 14.8, 16.3, 16.6, 18.4, 18.8, 19.5, 20.9}; Ranks[{xvals,yvals}] Note that each element of the input list {xvals,yvals} is converted to its rank in the combined list of 14 observations. The RankSumTest command can be used to conduct one-sided or two-sided tests of the equality of the X and Y distributions using the rank sum statistic for the first sample. For example, the following command returns the results of a test of the null hypothesis of equal- ity of distributions, versus the two-sided alternative that one distribution is stochastically larger than the other, using the exact distribution and the data above. RankSumTest[xvals,yvals,TwoSided→True,ExactDistribution→True] Results include the observed value of the R1 statistic, 24.0, and the two-sided p value based on the exact distribution, 0.004662. If X and Y satisfy a shift model with ∆=Median(X)−Median(Y ), then • The HodgesLehmannDelta command can be used to compute the HL estimate of the shift parameter. For example, HodgesLehmannDelta[xvals,yvals] returns −6.65 for the data above. • The RankSumCI command can be used to construct confidence intervals for the shift parameter. For example, the following command returns an exact confidence interval with confidence level ≥ 90% (as close as possible) for the data above. RankSumCI[xvals,yvals,ConfidenceLevel→0.90,ExactDistribution→True] Results include the interval, {−10.5,−3.2}, and the exact confidence level, 91.8748%. The ExactDistribution→True option should be added to the test and confidence interval procedures when sample sizes are 20 or less; otherwise, this option should be omitted. When the option is omitted, normal approximations are used instead of exact distributions. Note that all procedures make appropriate adjustments for ties in the observed data. 169 10.5 Sampling models The methods of this chapter assume that the measurements under study are the values of independent random samples from continuous distributions. In most applications, simple random samples of individuals are drawn from finite popula- tions and measurements are made on these individuals. If population sizes are large enough, then the resulting measurements can be treated as if they were the values of independent random samples. 10.5.1 Population model If simple random samples are drawn from sufficiently large populations of individuals, then sampling is said to be done under a population model. Under a population model, measure- ments can be treated as if they were the values of independent random samples. When comparing two distributions, sampling can be done separately from two subpop- ulations or from a total population. For example, a researcher interested in comparing achievement test scores of girls and boys in the fifth grade might sample separately from the subpopulations of fifth-grade girls and fifth-grade boys, or might sample from the pop- ulation of all fifth-graders and then split the sample into subsamples of girls and boys. A third possibility in the two sample setting is sampling from a total population followed by randomization to one of two treatments under study. For example, a medical researcher interested in determining if a new treatment to reduce serum cholesterol levels is more effective than the standard treatment in a population of women with very high levels of cholesterol might do the following: 1. Choose a simple random sample of n+m subjects from the population of women with very high levels of serum cholesterol. 2. Partition the n + m subjects into distinguishable subsets (or groups) of sizes n and m. 3. Administer the standard treatment to each subject in the first group for a fixed period of time, and the new treatment to each subject in the second group for the same fixed period of time. By randomly assigning subjects to treatment groups, the effect is as if sampling was done from two subpopulations: the subpopulation of women with high cholesterol who have been treated with the standard treatment for a fixed period of time, and the subpopulation of women with high cholesterol who have been treated with the new treatment for a fixed period of time. Note that, by design, the subpopulations differ in treatment only. 170 10.5.2 Randomization model The following is a common research scenario: A researcher is interested in comparing two treatments, and has n+ m subjects willing to participate in a study. The researcher randomly assigns n subjects to receive the first treatment; the remaining m subjects will receive the second treatment. Treatments could be competing drugs for reducing cholesterol (as above), or competing methods for teaching multivariable calculus. If the n + m subjects are not a simple random sample from the study population, but the assignment of subjects to treatments is one of (n+m n ) equally likely assignments, then sampling is said to be done under a randomization model. Under a randomization model for the comparison of treatments, chance enters into the experiment only through the assignment of subjects to treatments. The results of experi- ments conducted under a randomization model cannot be generalized to a larger population of interest, but may still be of interest to researchers. The Wilcoxon rank sum test is an example of a method that can be used to analyze data sampled under either the population model or the randomization model. Additional meth- ods will be discussed in later chapters of these notes. 171