Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Statistics 431: Paired Samples and Two-Sample Inference, Study notes of Statistics

University of Pennsylvania (UPenn)Statistics

A portion of lecture notes from a statistics 431 course focusing on paired samples and two-sample inference. The basics of paired samples, the difference between means for paired data, and the benefits of using paired data over unpaired data. It also introduces the concept of inference about two population proportions and provides large-sample tests and confidence intervals.

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-pq1 🇺🇸

10 documents

1 / 11

This page cannot be seen from the preview

Don't miss anything!

Statistics 431:

Statistical Inference

Lecture 8: More on two-sample inference

Discover Study notes of Statistics University of Pennsylvania (UPenn)

Partial preview of the text

Download Statistics 431: Paired Samples and Two-Sample Inference and more Study notes Statistics in PDF only on Docsity!

Statistics 431:

Statistical Inference

Lecture 8: More on two-sample inference

Paired samples: basics

(^) Our two samples X 1 ,... , Xm and Y 1 ,... , Yn up to now have been unpaired :

there’s no correspondence between observations in the first sample and observations in the second. (The number of Xi ’s is not even the same as the number of Yi ’s, in general.)

(^) In some situations where m = n, there is a natural matching of each Xi with

one Yi , to form the n pairs (Xi , Yi ).

(^) Examples:
- n patients in a clinical trial are given a sedative. The drowsiness of each patient i is measured one hour (Xi ) and two hours (Yi ) after treatment.
- n houses are selected at random in Philadelphia. At each house i, the radon level is measured in the basement (Xi ) and on the highest floor (Yi ).
- For each of n sets of identical twins, we measure the body mass index of the older (Xi ) and younger (Yi ) twin.
(^) Instead of comparing the sampled Xi ’s to the sampled Yi ’s, as for unpaired

data, we should focus on comparing the pairs (Xi , Yi ) to each other.

Why look at pairs?

(^) Include pairing

Time by Patient

Drowsiness

+1h +2h

1011

12 l l

+1h +2h

l l

+1h +2h

l l

+1h +2h

l l

+1h +2h

l l

+1h +2h

l l

+1h +2h

l l

+1h +2h

l l

+1h +2h

l l

P l l

l l

1011

12 l l

1011

12 l l

P19 (^) l P20l l l

l l

P l l

l l

P l l

l l

P l l

l l

P l l

l l

1011

12 l l

1011

l l

P l l

l l

Difference between means: paired test

(^) The data: (X 1 , Y 1 ),... , (Xn, Yn) ∼ p(x, y), IID. We are not assuming the

Xi ’s are independent of the Yi ’s as we did with unpaired data.

(^) E Xi = μ 1 , EYi = μ 2.
(^) Define the within-pair differences D 1 = X 1 − Y 1 ,... , Dn = Xn − Yn.
(^) Let μD = E Di (we know μD = μ 1 − μ 2 ), and let σ (^) D^2 = Var(Di ).
(^) Under H 0 : μD = 10 (the same null as with unpaired data), and substituting

in the sample variance, we get

T =

D¯ − 10

SD/

≈ N ( 0 , 1 )

for large samples. For small samples, T ∼ tn− 1 , once we add the assumption that each Di is normal.

(^) As before: HA : μD 6 = 10 ⇒ reject when |T | > cα

HA : μD > 1 0 ⇒ reject when T > cα HA : μD < 1 0 ⇒ reject when T < cα.

(^) As before, set cα using N ( 0 , 1 ) in large samples and tn− 1 in small samples.

Collect paired or unpaired data?

(^) We can be more precise about the benefits of paired data.
(^) Just like Var( X¯) = σ (^) X^2 /m and Var( Y¯ ) = σ (^) Y^2 /n, we know that Var( D¯) = σ (^) D^2 /n.
(^) But σ (^) D^2 = Var(Di ) = Var(Xi − Yi ) = σ (^) X^2 + σ (^) Y^2 − 2 · Cov(Xi , Yi ).
(^) So: when Xi and Yi are positively correlated, Var( D¯) is smaller than when

they are uncorrelated.

(^) Very often, the within-pair correlation is positive, rather than negative. In a

paired analysis, the positive correlation is reflected in S^2 D, which usually turns out smaller than S^2 X + S Y^2.

(^) However: with n Xi ’s and n Yi ’s, we have 2 n unpaired obsvns, but only n pairs. So the reference t distrn in a small sample will have 2 n − 1 obsvns for unpaired data, which is better than n − 1 for paired data.
(^) Despite this, for moderate n, the (usually) smaller SE of D¯ leads to

increased testing power via a larger T statistic, and narrower CIs.

(^) If Xi and Yi are negatively correlated in a paired dataset, you still need to

do a paired analysis, but you would have been better off without pairing!

Inference about two population proportions

(^) Recall the one-sample setup: p is the proportion of “successes” in a

population, i.e. the fraction of the population possessing some characteristic.

(^) Now we have two populations: the success fraction in population “A” is p 1 ,

and in “B” it is p 2.

(^) We draw m times independently from population “A”, and call X the number

of successes in the sample. Similarly, we make n independent draws from population “B”, and call Y the number of successes.

(^) Then X ∼ Bin(m, p 1 ) and Y ∼ Bin(n, p 2 ). We assume the two samples are

drawn independently, so X is independent of Y.

(^) Example: in 1954, a large-scale randomized controlled experiment was

conducted to study Salk’s polio vaccine. Randomizing a child to the control (placebo) group is like drawing from a population having probability p 1 of “success” (contracting polio). Randomizing a child to the treatment (vaccination) group is like drawing from a population with probability p 2 of contracting polio.

(^) In 1954, people were very interested in p 1 − p 2.

(^) If p 1 = p 2 = p, then X ∼ Bin(m, p), and, independently, Y ∼ Bin(n, p). But

then X + Y ∼ Bin(m + n, p).

(^) So a natural estimator of p under H 0 : p 1 = p 2 = p is just

p ˆ = (X + Y )/(m + n).

(^) This results in the large-sample test statistic

Z =

pˆ 1 − ˆp 2 √ pˆ( 1 − ˆp)( 1 /m + 1 /n)

and the procedures HA : p 1 − p 2 6 = 0 ⇒ reject when |Z | > zα/ 2 HA : p 1 − p 2 > 0 ⇒ reject when Z > zα HA : p 1 − p 2 < 0 ⇒ reject when Z < −zα.

Two population proportions: large-sample CIs

(^) With both m and n large, and substituting sample proportions in the

denominator,

Z =

pˆ 1 − ˆp 2 − ( p 1 − p 2 ) √ pˆ 1 ( 1 − ˆp 1 )/m + ˆp 2 ( 1 − ˆp 2 )/n

≈ N ( 0 , 1 ).

(^) Try this at home: write down the 100 ( 1 − α)% pivoting confidence statement

for p 1 − p 2 , using Z. Rearrange it to obtain the confidence interval

p ˆ 1 − ˆp 2 ± zα/ 2

pˆ 1 ( 1 − ˆp 1 ) m

pˆ 2 ( 1 − ˆp 2 ) n

(^) A suggested correction when m and n are not huge: replace pˆ 1 = X/m with

p ˜ 1 = (X + 1 )/(m + 2 ) and pˆ 2 with the analogous p˜ 2. This can improve the quality of the normal approximation, hence the correctness of the CI.

(^) When m or n is quite small? A different approach is needed, which we won’t cover. (Take more statistics courses.)

Statistics 431: Paired Samples and Two-Sample Inference, Study notes of Statistics

Related documents

Partial preview of the text

Download Statistics 431: Paired Samples and Two-Sample Inference and more Study notes Statistics in PDF only on Docsity!

Statistics 431:

Statistical Inference

Lecture 8: More on two-sample inference

Paired samples: basics

Why look at pairs?

Difference between means: paired test

T =

D¯ − 10

SD/

≈ N ( 0 , 1 )

Collect paired or unpaired data?

Inference about two population proportions

Z =

Two population proportions: large-sample CIs

Z =

≈ N ( 0 , 1 ).