Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Hypothesis Testing on Proportions: Testing Equality of Two Proportions and Chi-Square Test, Exams of Biostatistics

University of Wisconsin (UW) - Madison Biostatistics

An explanation of hypothesis testing on proportions, focusing on testing the equality of two proportions and the chi-square test. It includes examples and calculations for testing the difference in proportions of people suffering head injuries with and without helmet use, as well as an extension to more than two categories. The document also discusses the assumptions and limitations of the chi-square test.

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-7lw 🇺🇸

9 documents

1 / 5

This page cannot be seen from the preview

Don't miss anything!

1

Recall: Hypothesis testing on Proportions

Let pbe the true proportion of times that an event occurs in a

population. Suppose we would like to test

H0:p=p0against HA:p6=p0,

We collect a sample of size nfrom the population of interest. Under

the null, the estimate of the standard error of ˆptakes the form

v

u

t

p0(1 −p0)

n

The appropriate test statistic is

Z=(ˆp−p0)

rp0(1−p0)

n

If the null is true (p=p0), this statistic is approximately standard

normal for nlarge (we defined how large nneeds to be last time).

An approximate 100(1 −α) % confidence interval for p:





ˆp−zα/2v

u

t

ˆp(1 −ˆp)

n,ˆp+zα/2v

u

t

ˆp(1 −ˆp)

n





2

Suppose we would like to test

H0:p1=p2against HA:p16=p2,

We collect a sample of size n1from the first population and a sample

of size n2from the second population. Under the null, the estimate

of the standard error of the difference p1−p2takes the form

v

u

tˆp(1 −ˆp)



1

n1

+1

n2



where ˆp=n1ˆp1+n2ˆp2

n1+n2

The appropriate test statistic is

Z=( ˆp1−ˆp2)−(p1−p2)

sˆp(1 −ˆp)1

n1+1

n2

If n1and n2are large, this statistic is approximately standard normal.

3

A few lectures ago, we considered the effectiveness of bike helmets in

preventing head injury. In particular, we considered two random

samples: one of size 147 from a population of people that wear

helmets and the other of size 646 from a population of people that

do not wear helmets. We record that 17 of the 147 suffered a serious

head injury and 218 of the 646 suffered a serious head injury. We

wanted to know if the proportion of serious head injuries was the

same in the two populations.

Recall the evaluated test statistic was

z=(0.116 −0.337)

s0.296(1 −0.296) 1

147 +1

646=−5.3

p-value = P(Z≤ −5.3) + P(Z≥5.3)

= 5.8·10(−8) + 5.8·10(−8)

= 1.16 ·10(−7)

The null was rejected at significance level α= 0.01.

4

Another way to approach the same question is to consider a random

sample of 793 bike riders and classify the riders using two questions:

1. Do you wear a helmet ?

2. Have you suffered a serious head injury ?

Wearing Helmet

Head Injury (Y) (N) total

+ (Y) 17 218 235

- (N) 130 428 558

total 147 646 793

What would we expect this table to look like if the null was true ?

Discover Exams of Biostatistics University of Wisconsin (UW) - Madison

Partial preview of the text

Download Hypothesis Testing on Proportions: Testing Equality of Two Proportions and Chi-Square Test and more Exams Biostatistics in PDF only on Docsity!

Recall: Hypothesis testing on Proportions

Let p be the true proportion of times that an event occurs in a population. Suppose we would like to test

H 0 : p = p 0 against HA : p 6 = p 0 ,

We collect a sample of size n from the population of interest. Under the null, the estimate of the standard error of ˆp takes the form √√√ √√ p 0 (1^ −^ p 0 ) n

The appropriate test statistic is

Z =^ √(ˆp^ p^0 −(1^ −pp^00 )) n If the null is true (p = p 0 ), this statistic is approximately standard normal for n large (we defined how large n needs to be last time).

An approximate 100(1 − α) % confidence interval for p:   pˆ − z α/ 2

√√√ √√ pˆ(1 − pˆ) n ,^ pˆ^ +^ zα/^2

√√√ √√ pˆ(1 − pˆ) n

 

Suppose we would like to test

H 0 : p 1 = p 2 against HA : p 1 6 = p 2 ,

We collect a sample of size n 1 from the first population and a sample of size n 2 from the second population. Under the null, the estimate of the standard error of the difference p 1 − p 2 takes the form √√√ √√ (^) ˆp(1 − pˆ)   1 n 1 +

n 2

 

where ˆp = n^1 n^ pˆ^11 ++nn^22 pˆ^2 The appropriate test statistic is

Z = ( ˆp^ √^1 −^ pˆ^2 )^ −^ (p^1 −^ p^2 ) p ˆ(1 − ˆp) [ (^1) n 1 +^ n^12

]

If n 1 and n 2 are large, this statistic is approximately standard normal.

3

A few lectures ago, we considered the effectiveness of bike helmets in preventing head injury. In particular, we considered two random samples: one of size 147 from a population of people that wear helmets and the other of size 646 from a population of people that do not wear helmets. We record that 17 of the 147 suffered a serious head injury and 218 of the 646 suffered a serious head injury. We wanted to know if the proportion of serious head injuries was the same in the two populations.

Recall the evaluated test statistic was

z = √ (0.^116 −^0 .337) 0 .296(1 − 0 .296)

[ (^1) 147 +^6461

] =^ −^5.^3

p-value = P (Z ≤ − 5 .3) + P (Z ≥ 5 .3) = 5. 8 · 10 (−8)^ + 5. 8 · 10 (−8) = 1. 16 · 10 (−7)

The null was rejected at significance level α = 0.01.

4 Another way to approach the same question is to consider a random sample of 793 bike riders and classify the riders using two questions:

Do you wear a helmet?
Have you suffered a serious head injury?

Wearing Helmet Head Injury (Y) (N) total

(Y) 17 218 235

(N) 130 428 558 total 147 646 793

What would we expect this table to look like if the null was true?

Wearing Helmet Head Injury (Y) (N) total

(Y) NA NA 235

(N) NA NA 558 total 147 646 793

If the null was true, then the proportion of people suffering head injuries would be the same in the two populations (those that wear helmets and those that do not wear helmets). The proportion of people suffering head injuries is 235793 = 0.296; and the proportion of people not suffering head injuries is 558793 = 0. 704

As a result, if the null is true, then for the 147 people wearing helmets, we would expect 29.6 % (43.6) of them to suffer a head injury and 70.4 % (103.4) of them to be free of head injury. Similar reasoning applies to the the 646 people wearing helmets. So, if the null is true, we’d expect the table to look like the one below:

Wearing Helmet Head Injury (Y) (N) total

(Y) 43.6 191.4 235

(N) 103.4 454.6 558 total 147 646 793

How far is this expected table from the observed table? Wearing Helmet Head Injury (Y) (N) total

(Y) 43.6 191.4 235

(N) 103.4 454.6 558 total 147 646 793

Wearing Helmet Head Injury (Y) (N) total

(Y) 17 218 235

(N) 130 428 558 total 147 646 793

7

You could think about summing the squared differences between the four cells in the two tables: (17 − 43 .6)^2 + (218 − 191 .4)^2 + (130 − 103 .4)^2 + (428 − 454 .6)^2.

Under the null, X^2 = ∑^4 i=

(Oi − Ei)^2 Ei is approximately chi-square (χ^2 ) distributed with (2 − 1) · (2 − 1) degrees of freedom.

For this example, the value of the test statistic is

x 2 = (17^ −^43 .6)

2

6 +

(218 − 191 .4)^2

(130 − 103 .4)^2

(428 − 454 .6)^2

454. 6 = 28.^32

p-value: P (χ^21 ≥ 28 .32) = 1. 028 · 10 (−7)

The null is rejected and we conclude that there is an association between helmet wearing and suffering of a serious head injury.

8 Note: Since we are using discrete observations to estimtate a continuous distribution, a continuity correction could be applied which might make the approximation of the test statistic a little better. Yates proposed such a correction. Under the null, X^2 = ∑^4 i=

(|Oi − Ei| − 0 .5)^2 Ei is approximately chi-square (χ^2 ) distributed with (2 − 1) · (2 − 1) degrees of freedom. In the example above, the value of this corrected statistic is 27. and the pvalue is 1. 769 ∗ 10 (−7). In practice, you will often see this correction applied to 2 x 2 tables.

The proportion of people with breast cancer is 3220/13465 = 0. 2391 and the proportion of people with no breast cancer is 10245 /13465 = 0.7609. If age at first birth has no impact on the proportions, then for the 1742 people with first birth under the age of 20, we expect approximately 24 % (1742 · 0 .2391 = 416.6) to have breast cancer and 76 % of them (1742 · 0 .7609 = 1325.4) to be free of breast cancer.

In general,

The expected value in the (1,1) cell:

first row total × first column total grand total =

13 , 465 = 416.^6

The expected value in the (1,2) cell:

first row total × second column total grand total =

13 , 465 = 1348.^3

... verify the next few

The expected value in the (2,1) cell:

second row total × first column total grand total =

13 , 465 = 476.^3

... verify the next few The expected value in the (2,5) cell:

second row total × fifth column total grand total =

13 , 465 = 476.^3

15

The expected table is:

Age at first birth Disease status < 20 20 − 24 25 − 29 30 − 34 ≥ 35total (Breast Cancer +) 416.6 1348.3 933.6 371.9 149.7 3220 (Breast Cancer -) 1325.4 4289.7 2970.4 1183.1 476.3 10, total 1742 5638 3904 1555 626 13,

As before, we need to figure out if the observed table is “unusual” compared to this expected table.

Under the null,

X^2 =

r∑·c i=

(Oi − Ei)^2 Ei

is approximately chi-square (χ^2 ) distributed with (r − 1) · (c − 1) degrees of freedom.

Here, ∑ri=1·c(Oi− EEii)^2

= (320^ −^416 .6)

2

6 +

(1206 − 1348 .3)^2

1348. 3 +^...^ +

(406 − 476 .3)^2

16 Consider another example (2 x 2). We want to figure out if electronic fetal monitoring during labor affects the frequency of caesarean sections. 5824 infants are randomly sampled. Out of these, 2850 were monitored and 2974 were not. Results of C-sections are below. Monitored C section ( Yes) (No) total (Yes) 358 229 587 (No) 2492 2745 5237 total 2850 2974 5824

The expected table is Monitored C section ( Yes) (No) total (Yes) 287.25 299.75 587 (No) 2562.75 2674.25 5237 total 2850 2974 5824

and the value of the test statistic is

x 2 = (358^ −^287 .25)

2

25 +

(229 − 299 .75)^2

+ (2492^ −^2562 .75)

2

75 +

(2745 − 2674 .25)^2

The null is rejected with p < 0 .001 and we conclude that there is an association between monitoring and C-sections.

We don’t know (using tests on proportions) how strong the association is. Recall the odds ratio. If an event takes place with probability p, the odds in favor of the event are (^1) −pp to 1. p = 12 implies 1 to 1 odds; p = 23 implies 2 to 1 odds. In this class, the odds ratio (OR) is the odds of disease among exposed individuals divided by the odds of disease among unexposed.

OR = (^) P (Pdisease^ (disease|unexposed|exposed))//(1(1^ −−^ PP^ ((diseasedisease||exposedunexposed))))

Note that the OR is sometimes defined alternatively as

ORalt = (^) P (exposureP^ (exposure|nondiseased|disease))//(1(1^ −−^ PP^ ((exposureexposure||diseasenondiseased)) ))

Note that these definitions are equivalent (we showed that in an earlier lecture).

19

Exposure Disease ( Yes) (No) total (Yes) a b a+b (No) c d c+d total a+c b+d n

We also showed that the odds ratio (if table above was obtained from n independent observations) could be estimated by ad bc

20 Back to the C-section example... Monitored C section ( Yes) (No) total (Yes) 358 229 587 (No) 2492 2745 5237 total 2850 2974 5824

The odds ratio here is (358)(2745) (229)(2492) = 1. 72 The odds of being delivered by C-section are 1.72 times greater for fetuses that are being monitored. Consider two (unrelated) questions:

Does this imply that monitoring causes a condition which requires C-sections more often?
Is 1.72 significantly different than 1?

Hypothesis Testing on Proportions: Testing Equality of Two Proportions and Chi-Square Test, Exams of Biostatistics

Related documents

Partial preview of the text

Download Hypothesis Testing on Proportions: Testing Equality of Two Proportions and Chi-Square Test and more Exams Biostatistics in PDF only on Docsity!

] =^ −^5.^3

(218 − 191 .4)^2

(130 − 103 .4)^2

(428 − 454 .6)^2

454. 6 = 28.^32

13 , 465 = 416.^6

13 , 465 = 1348.^3

13 , 465 = 476.^3

13 , 465 = 476.^3

(1206 − 1348 .3)^2

1348. 3 +^...^ +

(406 − 476 .3)^2

(229 − 299 .75)^2

+ (2492^ −^2562 .75)

(2745 − 2674 .25)^2