Comparing Population Means: Two-Sample Inference in Statistics, Study notes of Statistics

The methods for inferring differences between the means of two populations based on samples. Large-sample and small-sample tests, as well as confidence intervals for the difference between means. The document also introduces the concept of pooled t-procedures and power calculations.

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-6za
koofers-user-6za 🇺🇸

10 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Statistics 431:
Statistical Inference
Lecture 7: Two-sample inference
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Comparing Population Means: Two-Sample Inference in Statistics and more Study notes Statistics in PDF only on Docsity!

Statistics 431:

Statistical Inference

Lecture 7: Two-sample inference

Two populations: basics

  • (^) Up to now, we have looked at samples from a single population, like

X 1 ,... , XnN (μ, σ 2 ). We asked

  • what are plausible values of μ? (confidence interval)
  • is it reasonable to say μ = μ 0 , or do the data strongly suggest μ 6 = μ 0? (hypothesis test)
  • (^) Often, we instead want to compare the properties of two different

population distrns.

  • distrn of years lived for people who (1) receive or (2) do not receive a clinical treatment (eg, compare means of distrns)
  • distrn of annual returns from investment vehicles (A) and (B) (eg, compare the variance-adjusted means)
  • distrn of party affiliation for American males and females (eg, compare proportion of Republicans in each group)
  • (^) We want to work out CI and testing ideas for comparative quantities like

μ 1 − μ 2.

Difference between means: large-sample test

  • (^) It’s natural to estimate μ 1 − μ 2 using X ¯ − ¯ Y.
  • (^) To conduct tests, we need to know the null distrn of X ¯ − ¯ Y.

To form CIs, we need to build a pivot based on X ¯ − ¯ Y.

  • (^) If m and n are large, then X ¯ ≈ N (μ 1 , σ 12 / m ) and Y ¯ ≈ N (μ 2 , σ 22 / n ).
  • (^) So X ¯ − ¯ YN (μ 1 − μ 2 , σ 12 / m + σ 22 / n ) (why?).
  • (^) Under H 0 : μ 1 − μ 2 = 10 , and substituting in sample variances, we get

T =

X ¯ − ¯ Y − 10

S 12 / m + S 22 / n

≈ N ( 0 , 1 ).

  • (^) To set critical values for significance level α:

1 for H 0 : μ 1 − μ 2 = 10 vs HA : μ 1 − μ 2 6 = 10 , reject when | T | > c (α) = z α/ 2 2 for H 0 : μ 1 − μ 2 ≤ 10 vs HA : μ 1 − μ 2 > 1 0 , reject when T > c (α) = z α 3 for H 0 : μ 1 − μ 2 ≥ 10 vs HA : μ 1 − μ 2 < 1 0 , reject when T < c (α) = − z α

  • (^) Example: Are Japanese cars more fuel-efficient than American cars?
  • (^) We gather a set of miles per gallon ratings X 1 ,... , Xm for m = 249

American models, and another set Y 1 ,... , Yn for n = 79 Japanese models.

  • (^) The sample statistics: x ¯ = 20. 14 mpg, s 1 = 6. 41 mpg; y ¯ = 30. 48 mpg,

s 2 = 6. 11 mpg.

  • (^) H 0 : μ 1 − μ 2 = 10 = 0 vs HA : μ 1 − μ 2 6 = 0. The test statistic:

T =

x ¯ − ¯ y − (^10) √ s 12 / m + s 22 / n

  • (^) This is a huge negative value. It will clearly lead to rejection at any usual

significance level, for a two-tailed or lower-tailed test: the p-value is 2 ( 1 − 8( 12. 95 )) ≈ 2 × 10 −^38 , where 8 is the cdf of the N ( 0 , 1 ) distrn.

  • (^) What are the two “populations” that we “sampled”?
  • (^) Is this evidence that American auto engineers are less talented than their

Japanese counterparts?

Difference between means: large-sample CI

  • (^) As always, to derive a CI we need a pivot: a quantity that
    • includes the data, X ¯ − ¯ Y
    • includes the unknown parameter, μ 1 − μ 2
    • has a known distrn
  • (^) For n and m large, we just discussed that TN ( 0 , 1 ), so T is a pivot. We

can write

P

− z α/ 2 <^

X ¯ − ¯ Y − (μ 1 − μ 2 ) √ S 12 / m + S 22 / n

< z α/ 2

 ≈^1 −^ α.

  • (^) Rearranging the event in the usual way, we get a 100 ( 1 − α)% CI for μ 1 − μ 2 :

X^ ¯ − ¯ Y ± z α/ 2

S 12

m

S 22

n

  • (^) Exactly the same idea as one-sample CI: sample mean ± normal quantile

times SE(sample mean).

Difference between means: small-sample CI

  • (^) Again, assume X 1 ,... , XmN (μ 1 , σ 12 ) and Y 1 ,... , YnN (μ 2 , σ 22 ).
  • (^) Suppose either m or n (or both) are small.
  • (^) Then T is once more a pivot, but its pivot distrn is t ν rather than N ( 0 , 1 ):

P

− t α/ 2 ;ν <^

X ¯ − ¯ Y − (μ 1 − μ 2 ) √ S 12 / m + S 22 / n

< t α/ 2 ;ν

 ≈^1 −^ α.

  • (^) This leads to a small-sample 100 ( 1 − α)% CI for μ 1 − μ 2 :

X^ ¯ − ¯ Y ± t α/ 2 ;ν

S 12

m

S 22

n

Power calculations

  • (^) Assume population distrns are N (μ 1 , σ 12 ) and N (μ 2 , σ 22 ).
  • (^) We conduct a test of H 0 : μ 1 − μ 2 ≤ 10 vs HA : μ 1 − μ 2 > 1 0 , using the test

statistic Z = ( X ¯ − ¯ Y − 10 )/σ (^) X ¯ − ¯ Y. (We define σ (^) X ¯ − ¯ Y =

σ 12 / m + σ 22 / n .)

  • (^) What is the power against the particular alternative μ 1 − μ 2 = 1 ′? (Here

1 ′^ ∈ HA , ie 1 ′^ > 1 0 .)

  • (^) Power = 1 − β(1′).

β(1′) = P 1 ′(accept H 0 ) = P 1 ′( X ¯ − ¯ Y < 1 0 + z α · σ (^) X ¯ − ¯ Y )

= 8

z α −

1 ′^ − 10

σ (^) X ¯ − ¯ Y

  • (^) Can derive similar expressions for lower-tailed and two-tailed tests (see

Devore).