Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An explanation of how to compare two population proportions using hypothesis testing and confidence intervals. It includes examples of calculating z-statistics, p-values, and confidence intervals for two different populations. The document also discusses the assumptions and requirements for using these methods.
Typology: Study notes
1 / 13
Due March 17th: 18:1-4, 14, 15, 18, 19, 24, 35, 36. 19: 8, 9, 12, 15 Excel assignment. 19: 20a, 21.
In a certain population, we expect 46% of the population to get a cold in a 3 month period. We give 264 volunteers 1000 mgs of vitamin C per day for 3 months. At the end of the period 119 people (45%) have gotten colds. So the proportion of our sample that have gotten colds is pˆ =. 45.
Let p be the proportion of people being treated with vitamin C who will get colds. We would like to know if there is evidence that p is smaller than. 46.
z =
pˆ − p 0 p p 0 (1 − p 0 )/n
p .46(.54)/ 264
This is the probability (if vitamin C has no effect on the probability of getting a cold) that we would see this number of colds or fewer in our sample population.
In English: if vitamin C has no effect on the probability of getting a cold, we would see this number of colds or fewer among our volunteers 37% of the time. Since that is a high probability, we have no evidence that vitamin C effects the probability of getting a cold.
Let X be some random variable of a large population which has values SUCCESS and FAILURE. Take an SRS of size n from our population, and let pˆ be the proportion of the sample which is “SUCCESS.” So
p ˆ =
number of successes in a sample total size of sample
σ =
s p(1 − p) n
We consider a situation where we wish to compare two proportions. Typically we would like to compare the effect of two different treatments, either in an experiment or in an observation.
We assume we have two populations:
Population Proportion Sample Size Sample proportion 1 p 1 n 1 pˆ 1 2 p 2 n 2 pˆ 2
We’d like to use pˆ 1 − pˆ 2 to estimate p 1 − p 2. The critical observations are the following:
p 2 (1 − p 2 ) n 2
To estimate confidence intervals, we could proceed as usual, using for our error term
s p ˆ 1 (1 − pˆ 1 ) n 1
pˆ 2 (1 − pˆ 2 ) n 2
This is the Large Sample standard error. gWe can use this if each population is at least 10 times the sample sizes, and the samples sizes are large enough to contain at least 10 successes and 10 failures each.
It turns out that we get more accurate results by using the “Plus 4” confidence estimates (adding 4 imaginary observations - 2 to our first sample, and two to our second sample). So we replace n 1 above with n 1 + 2, replace n 2 with n 2 + 2. pˆ 1 with p 1 (where we add two observations, one success and one failure) and similary, pˆ 2 with p 2. So we get
s p 1 (1 − p 1 ) n 1 + 2
p 2 (1 − p 2 ) n 2 + 2
Then if z∗ is a critical value for confidence level C, we get p 1 − p 2 is
p 1 − p 2 ± z∗SE
with confidence C.
Example: Problem 19.3. Note that a “success” in this problem is a positive drug test result.
Population Proportion Sample Size Sample proportion Tested athletes p 1 135 p 1 = 1378 =. 0584 Untested athletes p 2 141 p 2 = 14328 =. 196
Then p 1 − p 2 is
s .0584(1 − .0584) 137
−. 1374 ± 1. 96 .0388 = −. 1374 ±. 076 with 95% confidence.
So if these samples were random, we have p 1 − p 2 is between −. 0614 and −. 2134 with 95% confidence.
Example: Hypothesis testing for two-sample proportions
A study in Britain (published in The Lancet) examined 120,633 women pregnant with their second child. Of the 17,754 who had delivered their first child by Ceasarean section, there were 68 stillbirths before labor. Of the remaining 102,879, there were 244 stillbirths before labor.
We wish to know if this is evidence that a Ceasarean section increases the likelihood of a stillbirth in the next pregnancy.
Population Proportion Sample Size Sample proportion C-section p 1 17 , 754 pˆ 1 = (^1768) , 754 =. 00383 no C-section p 2 102 , 879 pˆ 2 = (^102244) , 879 =. 00237
In addition to the numbers above, we’ll need a number called the pooled sample proportion. This is
p ˆ =
p 1 is the proportion of pregnancies preceded by a C-section which result in stillbirths.
p 2 is the proportion of pregnancies not preceded by a C-section which result in stillbirths.
z =
pˆ 1 − pˆ 2 q p ˆ(1 − pˆ)( (^) n^11 + (^) n^12 )
q .00259(1 − .00259)( (^17) ,^1754 + (^1021) , 879 )
NOTE: this method is approriate if
February 1999: Gallup polls 1134 adults finds 408 own a gun. October 2004: Gallup polls 1134 adlts finds 431 own a gun.
Is a newspaper correct to run headline “Percentage of gun owners rising?”
Let p 1 be percentage of population owning a gun in 1999, p 2 percentage in 2004. We wish to test the hypothesis that p 2 > p 1.
z =
pˆ 1 − pˆ 2 q p ˆ(1 − pˆ)( 11341 + 11341 )
q
. 37 ·. 63 · 11342