Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Comparing Two Samples: Differences in Means and Variances - Prof. Jenny A. Baglivo, Study notes of Mathematical Statistics

Boston College (BC)Mathematical Statistics

Prof. Jenny A. Baglivo

Information on comparing two probability distributions, specifically focusing on the difference in means and variances. It includes details on exact methods for normal distributions, approximate methods, and transformations to normality. The document also discusses the sampling distribution of the ratio of sample variances and provides exact methods for the ratio of variances parameter for normal distributions.

Typology: Study notes

Pre 2010

Uploaded on 02/24/2010

koofers-user-ydn 🇺🇸

9 documents

1 / 35

This page cannot be seen from the preview

Don't miss anything!

MT427 Notebook 5

prepared by Professor Jenny Baglivo

c

5 MT427 Notebook 5 3

5.1 Two Sample Analysis: Difference in Means . . . . . . . . . . . . . . . . . . . . . . . . . 3

5.1.1 Introduction: Notation and Model Summaries . . . . . . . . . . . . . . . . . . . . 3

5.1.2 Exact Metho ds for Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . 4

5.1.3 ApproximateMethods................................. 11

5.1.4 Transformations to Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5.2 Two Sample Analysis: Ratio of Variances . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.2.1 FRatioDistribution.................................. 15

5.2.2 Sampling Distribution of Ratio of Sample Variances . . . . . . . . . . . . . . . . 16

5.2.3 Exact Metho ds for Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . 17

5.3 Nonparametric Methods for Two Sample Analysis . . . . . . . . . . . . . . . . . . . . . 19

5.3.1 Definitions ....................................... 19

5.3.2 Wilcoxon Rank Sum Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.3.3 Wilcoxon Rank Sum Distribution and Methods . . . . . . . . . . . . . . . . . . . 22

5.3.4 Mann-Whitney U Statistic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.3.5 Mann-Whitney U Distribution and Methods . . . . . . . . . . . . . . . . . . . . 28

5.3.6 Hodges-Lehmann (HL) Estimator of Shift Parameter . . . . . . . . . . . . . . . . 30

5.3.7 Exact Confidence Interval Procedure for Shift Parameter . . . . . . . . . . . . . 31

5.4 SamplingModels ........................................ 34

5.4.1 PopulationModel ................................... 34

5.4.2 RandomizationModel ................................. 35

1

Discover Study notes of Mathematical Statistics Boston College (BC)

Partial preview of the text

Download Comparing Two Samples: Differences in Means and Variances - Prof. Jenny A. Baglivo and more Study notes Mathematical Statistics in PDF only on Docsity!

prepared by Professor Jenny Baglivo

MT427 Notebook
5 MT427 Notebook © c Copyright 2009 by Jenny A. Baglivo. All Rights Reserved.
- 5.1 Two Sample Analysis: Difference in Means
  - 5.1.1 Introduction: Notation and Model Summaries
  - 5.1.2 Exact Methods for Normal Distributions
  - 5.1.3 Approximate Methods
  - 5.1.4 Transformations to Normality
- 5.2 Two Sample Analysis: Ratio of Variances
  - 5.2.1 F Ratio Distribution
  - 5.2.2 Sampling Distribution of Ratio of Sample Variances
  - 5.2.3 Exact Methods for Normal Distributions
- 5.3 Nonparametric Methods for Two Sample Analysis
  - 5.3.1 Definitions
  - 5.3.2 Wilcoxon Rank Sum Statistic
  - 5.3.3 Wilcoxon Rank Sum Distribution and Methods
  - 5.3.4 Mann-Whitney U Statistic
  - 5.3.5 Mann-Whitney U Distribution and Methods
  - 5.3.6 Hodges-Lehmann (HL) Estimator of Shift Parameter
  - 5.3.7 Exact Confidence Interval Procedure for Shift Parameter
- 5.4 Sampling Models
  - 5.4.1 Population Model
  - 5.4.2 Randomization Model

5.1.2 Exact Methods for Normal Distributions

If X and Y are normal random variables, then X − Y has a normal distribution.

There are two situations where this fact can be used to construct exact methods for analyzing the difference in means:

σx, σy Known: Statistical methods use the fact that the standardized difference

Z =

(X − Y ) − δ √ σ x^2 n +^

σ^2 y m

has a standard normal distribution.

σx = σy Estimated: Statistical methods use the fact that the approximately stan- dardized difference

T =

(X − Y ) − δ √ S p^2

n +^

1 m

) has a Student t distribution with^ n^ +^ m^ −^2 df.

In this formula, S^2 p is the pooled estimator of the common variance:

S^2 p =

(n − 1)S x^2 + (m − 1)S^2 y n + m − 2

where S x^2 and S y^2 are the sample variances for the X and Y samples.

Note that, in order to get an exact Student t distribution in the second situation, we need to assume that the unknown standard deviations are equal.

To illustrate the computation for estimating a common variance, suppose that n = 8, m = 6, s^2 x = 8.58 and s^2 y = 12.35 are observed. Then the estimate of the common variance is

Exercise. Let σ^2 = σ x^2 = σ^2 y be the common variance of the X and Y distributions. Under the assumptions of this section, demonstrate that S^2 p is an unbiased estimator of σ^2.

Confidence interval procedures. The following tables give 100(1−α)% confidence interval procedures for the difference in means parameter, δ = μx − μy.

σx, σy Known: ( X − Y

) ± z(α/2)

√ σ^2 x n +^

σ^2 y m where z(α/2) is the 100(1 − α/2)% point of the standard normal distribution.

σx = σy Estimated: ( X − Y

) ± tn+m− 2 (α/2)

√ S p^2

( 1 n

1 m

)

where tn+m− 2 (α/2) is the 100(1 − α/2)% point on the Student t distribution with (n + m − 2) df.

Hypothesis testing procedures. The following tables give size α tests of the null hypoth- esis that the difference in means parameter is a fixed value: Ho : δ = δo.

σx, σy Known 2. σx = σy Estimated

Test Statistic Z =

( X − Y

) − δo √ σ^2 x n +^

σ^2 y m

T = (X − Y ) − δ 0 √ S p^2

( (^1) n +^ 1 m

)

RR for Ha : δ < δo Z ≤ −z(α) T ≤ −tn+m− 2 (α)

RR for Ha : δ > δo Z ≥ z(α) T ≥ tn+m− 2 (α)

RR for Ha : δ 6 = δo |Z| ≥ z(α/2) |T | ≥ tn+m− 2 (α/2)

Exercise. Assume the following data are the values of independent random samples from normal distributions with common standard deviation 2.

X Sample (n = 8, x = 10. 1 ):
07 , 7. 00 , 9. 49 , 9. 76 , 11. 19 , 11. 31 , 12. 96 , 13. 02
Y Sample (m = 12, y = 6. 83 ):
86 , 4. 52 , 5. 14 , 5. 23 , 5. 33 , 6. 32 , 7. 21 , 7. 56 , 7. 94 , 8. 19 , 9. 07 , 11. 59

Exercise (Source: Shoemaker, JSE, 1996): Normal body temperatures of 148 subjects were taken several times over two consecutive days. A total of 130 values are reported below.

X Sample: 65 temperatures (in degrees Fahrenheit) for women
4 96. 7 96. 8 97. 2 97. 2 97. 4 97. 6 97. 7 97. 7 97. 8 97. 8 97. 8 97. 9
9 97. 9 98. 0 98. 0 98. 0 98. 0 98. 0 98. 1 98. 2 98. 2 98. 2 98. 2 98. 2
2 98. 3 98. 3 98. 3 98. 4 98. 4 98. 4 98. 4 98. 4 98. 5 98. 6 98. 6 98. 6
6 98. 7 98. 7 98. 7 98. 7 98. 7 98. 7 98. 8 98. 8 98. 8 98. 8 98. 8 98. 8
8 98. 9 99. 0 99. 0 99. 1 99. 1 99. 2 99. 2 99. 3 99. 4 99. 9 100. 0 100. 8

Sample summaries: n = 65, x = 98.3938, sx = 0. 7435

Y Sample: 65 temperatures (in degrees Fahrenheit) for men
3 96. 7 96. 9 97. 0 97. 1 97. 1 97. 1 97. 2 97. 3 97. 4 97. 4 97. 4 97. 4
5 97. 5 97. 6 97. 6 97. 6 97. 7 97. 8 97. 8 97. 8 97. 8 97. 9 97. 9 98. 0
0 98. 0 98. 0 98. 0 98. 0 98. 1 98. 1 98. 2 98. 2 98. 2 98. 2 98. 3 98. 3
4 98. 4 98. 4 98. 4 98. 5 98. 5 98. 6 98. 6 98. 6 98. 6 98. 6 98. 6 98. 7
7 98. 8 98. 8 98. 8 98. 9 99. 0 99. 0 99. 0 99. 1 99. 2 99. 3 99. 4 99. 5

Sample summaries: m = 65, y = 98.1046, sy = 0. 6988

Left Plot: Side-by-side box plots of the two samples are shown on the left. The sample distributions are approximately symmetric.
Right Plot: A normal probability plot of standardized temperatures is shown on the right, where

(a) each x value is replaced by (x − x)/sx; (b) each y value is replaced by (y − y)/sy; and (c) the 130 ordered standardized values (vertical axis; observed) are plotted against the k/ 131 st^ quantiles of the standard normal distribution (horizontal axis; expected).

The normal probability plot has been enhanced to include the results of 100 simulations from the standard normal distribution: For each k = 1, 2 ,... , 130, the minimum and maximum value of the 100 simulated kth^ order statistics are plotted.

Assume these data are the values of independent random samples from normal distributions with a common variance.

Test the μx = μy versus μx 6 = μy at the 5% level.
Construct a 95% confidence interval for the difference in means, μx − μy.
Comment on the analyses.

Assume these data are the values of independent random samples from normal distributions with a common variance.

Test the μx = μy versus μx 6 = μy at the 5% level.
Construct a 95% confidence interval for the difference in means, μx − μy.
Comment on the analyses.

5.1.3 Approximate Methods

In addition to the exact methods given in the last section, there are approximate methods we can use to answer questions about the difference in means parameter, δ = μx − μy.

σx 6 = σy Estimated, Normal Samples: Assume that X and Y are normal random variables, and that σx 6 = σy. Statistical methods use the fact that the approximate standardization T =

(X − Y ) − (μx − μy) √ S x^2 n +^

S^2 y m has an approximate Student t distribution with degrees of freedom as follows:

df =

(S x^2 /n) + (S y^2 /m)

(S x^2 /n)^2 /n + (S y^2 /m)^2 /m

σx, σy Estimated, Large Samples: Assume that n and m are large. Statistical methods use the fact that the approximate standardization

Z =

(X − Y ) − (μx − μy) √ S^2 x n +^

S y^2 m has an approximate standard normal distribution.

Notes:

Pooled versus Welch t Methods: Exact methods for normal samples when σx = σy is estimated using pooled information are called pooled t methods. Approximate methods for normal samples where σx and σy are separately estimated are called Welch t methods, after the mathematician who proved (in the 1940’s) that the sampling distribution was approximately Student t.
Computing the Degrees of Freedom: To apply the formula for df developed by Welch for the first situation above, you would round the expression on the right to the closest whole number. The computed df satisfies the following inequality:

min(n, m) − 1 ≤ df ≤ n + m − 2.

A quick by-hand method is to use the lower bound for df instead of Welch’s formula.

Central Limit Theorem: The central limit theorem can be used to demonstrate that the difference in sample means, X − Y , is approximately normally distributed when both n and m are large enough. Thus, the Z given in the second situation above has an approximately standard normal distribution when both n and m are large enough.

Assume the information on the previous page is a summary of independent random samples from normal distributions. Construct an approximate 95% confidence for the difference in means parameter, δ = μx − μy, and comment on your analysis.

5.1.4 Transformations to Normality

Methods based on sampling from normal distributions are popular and easy to apply.

For this reason, researchers often transform their data to achieve approximate normality, and then use normal theory methods on the transformed scale.

For example, the left plot below shows side-by-side box plots of samples taken from skewed pos- itive distributions, and the right plot shows an enhanced normal probability plot of combined standardized values.

Notice that the boxes are asymmetric, there are large outliers, and the normal probability plot has a pronounced bend.

By contrast, plots based on a log transformation of the data suggest that normal theory methods could be applied to the log-transformed data.

Footnotes. Although the use of transformations is attractive, there are many drawbacks. For example, it may be difficult to find an appropriate transformation, or it may be difficult to interpret the results back on the original scale.

In Section 5.3 (page 19) of these notes, we will study methods that can be used for a broad range of distributions.

Reciprocal: If F has an f ratio distribution with n 1 and n 2 degrees of freedom, then the reciprocal of F has an f ratio distribution with n 2 and n 1 degrees of freedom.
Quantiles: The notation fp is used to denote the pth quantile (100pth percentile) of the f ratio distribution. The Rice textbook includes tables for p = 0.90 (page A10), p = 0. 95 (page A11), p = 0.975 (page A12), and p = 0.99 (page A13). The p = 0. 10 , 0. 05 , 0. 025 , 0 .01 quantiles can be computed using reciprocals. Specifically,

fp on n 1 , n 2 df =

f 1 −p on n 2 , n 1 df

To illustrate the use of the tables in the textbook, let n 1 = 8 and n 2 = 10. Then

When p = 0.90, 0.95, 0.975 and 0.99, the values are read from the tables: f 0. 90 = 2. 38 , f 0. 95 = 3. 07 , f 0. 975 = 3. 85 , f 0. 99 = 5. 06.
When p = 0.10, 0.05, 0.025, and 0.01, the quantiles are computed using reciprocals. Specifically, since P (F ≤ x) = P

F

x

for every x,

to obtain the 0.10, 0.05, 0.025, and 0.01 quantiles of the distribution with 8 degrees of freedom in the numerator and 10 degrees of freedom in the denominator, we use the reciprocals of the 0.90, 0.95, 0.975, and 0.99 quantiles of the f ratio distribution with 10 degrees of freedom in the numerator and 8 degrees of freedom in the denominator. Thus,

f 0. 10 = 1

54 = 0. 39 , f 0. 05 = 1
35 = 0. 30 , f 0. 025 = 1
30 = 0. 23 , f 0. 01 = 1
81 = 0. 17.

5.2.2 Sampling Distribution of Ratio of Sample Variances

Let X be a normal random variable with mean μx and standard deviation σx, and let Y be a normal random variable with mean μy and standard deviation σy.

The following theorem tells us about the sampling distribution of the ratio of sample variances when samples are chosen independently from the X and Y distributions.

Theorem (Sampling Distribution). Let S x^2 and S y^2 be the sample variances of independent random samples of sizes n and m, respectively, from the X and Y distributions. Then

F =

S x^2 /S^2 y σ^2 x/σ^2 y

has an f ratio distribution with (n − 1) and (m − 1) degrees of freedom, where the numerator is the ratio of sample variances and the denominator is the ratio of model variances.

To demonstrate that the conclusion of the theorem is correct, first note that

U = (n σ− (^2) x1) S x^2 has a chi-square distribution with (n − 1) df, and
V = (m σ− y 2 1) S y^2 has a chi-square distribution with (m − 1) df.

Now (please complete the demonstration),

5.2.3 Exact Methods for Normal Distributions

Let X and Y be normal random variables. Under the conditions of the last section, the following tables give exact confidence interval and hypothesis test methods for the ratio of variances parameter, r = σ^2 x/σ^2 y.

100(1 − α)% CI for r = σ x^2 /σ^2 y , when μx and μy are estimated: [ S^2 x/S y^2 fn− 1 ,m− 1 (α/2) ,^

S x^2 /S y^2 fn− 1 ,m− 1 (1 − α/2)

]

where fn− 1 ,m− 1 (p) is the 100(1 − p)% point of the f ratio distribution with (n − 1) and (m − 1) df.

100 α% tests of Ho : r = ro, when μx and μy are estimated:

Test Statistic: F = S x^2 /S y^2 ro

RR for Ha : r < ro: F ≤ fn− 1 ,m− 1 (1 − α)

RR for Ha : r > ro: F ≥ fn− 1 ,m− 1 (α)

RR for Ha : r 6 = ro: F ≤ fn− 1 ,m− 1 (1 − α/2) or F ≥ fn− 1 ,m− 1 (α/2)

5.3 Nonparametric Methods for Two Sample Analysis

This section focuses on broadly-applicable two sample analysis methods.

5.3.1 Definitions

Parametric/Nonparametric Methods: Statistical methods that require strong assump- tions about the shapes of distributions (for example, uniform or exponential), and ask questions about parameter values are called parametric methods. By contrast, nonparametric methods (also known as distribution-free methods) make mild assumptions, such as, “the distributions are continuous” or “the continuous distributions are symmetric around their centers.”
Stochastically Larger/Smaller: Let V and W be continuous random variables. V is stochastically larger than W (corresponding, W is stochastically smaller than V ) if

P (V ≥ x) ≥ P (W ≥ x) for all real numbers x,

with strict inequality (that is, where “>” replaces “≥”) for at least one x.

Shift Model: The random variables V and W are said to satisfy a shift model if

V − ∆ and W have the same distribution,

where ∆ is the difference in medians: ∆ = Median(V ) − Median(W ).

Shift Parameter: The parameter ∆ from above is called the shift parameter.

Example: Quantile confidence interval procedure. Most of the statistical methods we have worked with so far have been parametric methods. An example of a nonparametric method is the quantile confidence interval procedure from Section 4.2.3 of these notes:

Let X be a continuous random variable, θ be the pth^ quantile of the X distribution, for some proportion p ∈ (0, 1), and X(k) be the kth^ order statistic of a random sample of size n from the X distribution. Then [ X(k 1 ), X(k 2 )

]

is a 100(1 − α)% confidence interval for θ, where

the indices k 1 and k 2 are chosen so that

P (θ < X(k 1 )) =

∑k 1 − 1 j=

(n j

pj^ (1 − p)n−j^ = α/ 2

P (X(k 1 ) < θ < X(k 2 )) =

∑k 2 − 1 j=k 1

(n j

pj^ (1 − p)n−j^ = 1 − α

P (θ > X(k 2 )) =

∑n j=k 2

(n j

pj^ (1 − p)n−j^ = α/ 2.

Illustration: Stochastically larger/smaller random variables. To illustrate the defi- nition of stochastically larger/smaller, consider the following plots of the PDFs (left plot), and the CDFs (right plot) of two random variables: V (solid blue) and W (dashed gray).

V is stochastically larger than W (correspondingly, W is stochastically smaller than V ).

Note that if V is stochastically larger than W , then their CDFs satisfy the inequality

FV (x) ≤ FW (x) for all x, with strict inequality for at least one x.

Example: Random variables satisfying shift models. If V and W satisfy a shift model with shift parameter ∆, then their distributions must have the same shape.

Here are two examples:

Normal Distribution, σ = 5: If V is a normal random variable with mean 10 and standard de- viation 5, and W is a normal random variable with mean 3 and standard deviation 5, then V and W satisfy a shift model with shift parame- ter ∆ = 7.

Since ∆ > 0, V is stochastically larger than W.

Shifted Exponential Distribution, λ = 1/ 10 : If V be an exponential random variable with parameter 1 10 , and let^ W^ be a shifted exponential random variable with PDF as follows:

f (x) =

e−(x−8)/^10 when x > 8,

and 0 otherwise. Then V and W satisfy a shift model with shift parameter ∆ = −8.

Since ∆ < 0, W is stochastically larger than V.

Comparing Two Samples: Differences in Means and Variances - Prof. Jenny A. Baglivo, Study notes of Mathematical Statistics

Related documents

Partial preview of the text

Download Comparing Two Samples: Differences in Means and Variances - Prof. Jenny A. Baglivo and more Study notes Mathematical Statistics in PDF only on Docsity!

prepared by Professor Jenny Baglivo

Z =

T =

Z =

F

F =

5.3 Nonparametric Methods for Two Sample Analysis

]