Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Comparing Two Population Proportions: Hypothesis Testing and Confidence Intervals - Prof. , Study notes of Probability and Statistics

An explanation of how to compare two population proportions using hypothesis testing and confidence intervals. It includes examples of calculating z-statistics, p-values, and confidence intervals for two different populations. The document also discusses the assumptions and requirements for using these methods.

Typology: Study notes

Pre 2010

Uploaded on 07/29/2009

koofers-user-mti
koofers-user-mti 🇺🇸

10 documents

1 / 13

Toggle sidebar

Related documents


Partial preview of the text

Download Comparing Two Population Proportions: Hypothesis Testing and Confidence Intervals - Prof. and more Study notes Probability and Statistics in PDF only on Docsity!

March 13, 2006, §19: Comparing two population proportions

Due March 17th: 18:1-4, 14, 15, 18, 19, 24, 35, 36. 19: 8, 9, 12, 15 Excel assignment. 19: 20a, 21.

Review: Hypothesis testing for one-sample proportions.

In a certain population, we expect 46% of the population to get a cold in a 3 month period. We give 264 volunteers 1000 mgs of vitamin C per day for 3 months. At the end of the period 119 people (45%) have gotten colds. So the proportion of our sample that have gotten colds is pˆ =. 45.

Let p be the proportion of people being treated with vitamin C who will get colds. We would like to know if there is evidence that p is smaller than. 46.

  1. Our null hypothesis is H 0 : p = p 0 (= .46). In other words, that the percent of people who get colds is unaffected by taking vitamin C. Our alternative hypothesis is Ha : p < p 0 =. 46.
  2. Our z-statistic is

z =

pˆ − p 0 p p 0 (1 − p 0 )/n

=

. 45 −. 46

p .46(.54)/ 264

= −. 326.

  1. Our P -value is P (Z ≤ −.326) =. 3722.

This is the probability (if vitamin C has no effect on the probability of getting a cold) that we would see this number of colds or fewer in our sample population.

  1. Conclusion: We do not have enough evidence to reject the null hypothesis.

In English: if vitamin C has no effect on the probability of getting a cold, we would see this number of colds or fewer among our volunteers 37% of the time. Since that is a high probability, we have no evidence that vitamin C effects the probability of getting a cold.

Let X be some random variable of a large population which has values SUCCESS and FAILURE. Take an SRS of size n from our population, and let pˆ be the proportion of the sample which is “SUCCESS.” So

p ˆ =

number of successes in a sample total size of sample

  • For large n, the sampling distribution of pˆ is approximately normal; N (p, σ) where
  • p is the proportion of the entire population which is “SUCCESS” and

σ =

s p(1 − p) n

.

§19: Comparing two proportions

We consider a situation where we wish to compare two proportions. Typically we would like to compare the effect of two different treatments, either in an experiment or in an observation.

We assume we have two populations:

Population Proportion Sample Size Sample proportion 1 p 1 n 1 pˆ 1 2 p 2 n 2 pˆ 2

We’d like to use pˆ 1 − pˆ 2 to estimate p 1 − p 2. The critical observations are the following:

  • if n 1 and n 2 are large enough, the distribution of pˆ 1 − pˆ 2 is approximately normal,
  • with mean p 1 − p 2
  • and standard deviation (^) s p 1 (1 − p 1 ) n 1

+

p 2 (1 − p 2 ) n 2

.

To estimate confidence intervals, we could proceed as usual, using for our error term

SE =

s p ˆ 1 (1 − pˆ 1 ) n 1

+

pˆ 2 (1 − pˆ 2 ) n 2

This is the Large Sample standard error. gWe can use this if each population is at least 10 times the sample sizes, and the samples sizes are large enough to contain at least 10 successes and 10 failures each.

It turns out that we get more accurate results by using the “Plus 4” confidence estimates (adding 4 imaginary observations - 2 to our first sample, and two to our second sample). So we replace n 1 above with n 1 + 2, replace n 2 with n 2 + 2. pˆ 1 with p 1 (where we add two observations, one success and one failure) and similary, pˆ 2 with p 2. So we get

SE =

s p 1 (1 − p 1 ) n 1 + 2

+

p 2 (1 − p 2 ) n 2 + 2

.

Then if z∗ is a critical value for confidence level C, we get p 1 − p 2 is

p 1 − p 2 ± z∗SE

with confidence C.

Example: Problem 19.3. Note that a “success” in this problem is a positive drug test result.

Population Proportion Sample Size Sample proportion Tested athletes p 1 135 p 1 = 1378 =. 0584 Untested athletes p 2 141 p 2 = 14328 =. 196

Then p 1 − p 2 is

. 0584 −. 196 ± 1. 96

s .0584(1 − .0584) 137

+

.196(1 − .196)

=

−. 1374 ± 1. 96 .0388 = −. 1374 ±. 076 with 95% confidence.

So if these samples were random, we have p 1 − p 2 is between −. 0614 and −. 2134 with 95% confidence.

Example: Hypothesis testing for two-sample proportions

A study in Britain (published in The Lancet) examined 120,633 women pregnant with their second child. Of the 17,754 who had delivered their first child by Ceasarean section, there were 68 stillbirths before labor. Of the remaining 102,879, there were 244 stillbirths before labor.

We wish to know if this is evidence that a Ceasarean section increases the likelihood of a stillbirth in the next pregnancy.

Population Proportion Sample Size Sample proportion C-section p 1 17 , 754 pˆ 1 = (^1768) , 754 =. 00383 no C-section p 2 102 , 879 pˆ 2 = (^102244) , 879 =. 00237

In addition to the numbers above, we’ll need a number called the pooled sample proportion. This is

p ˆ =

68 + 244

17 , 754 + 102, 879

=. 00259

p 1 is the proportion of pregnancies preceded by a C-section which result in stillbirths.

p 2 is the proportion of pregnancies not preceded by a C-section which result in stillbirths.

  1. Our null hypothesis is that having a C-section does not affect later possibility of stillbirth. So H 0 : p 1 = p 2. Our alternative hypothesis is Ha : p 1 > p 2. (That C-sections are associated with higher rates of stillbirths in the subsequent pregnancy.)
  2. We calculate a two sample z-statistic as follows:

z =

pˆ 1 − pˆ 2 q p ˆ(1 − pˆ)( (^) n^11 + (^) n^12 )

=

. 00383 −. 00237

q .00259(1 − .00259)( (^17) ,^1754 + (^1021) , 879 )

= 3. 535.

  1. We now calculate our P -value. We calculate P (Z ≥ 3 .535). This is the probability (if H 0 is true) that we would see at least this many more stillbirths in the C-section population. P (Z ≥ 3 .535) =. 0002
  1. Conclusion: If H 0 is true, then the chance of results like the ones we see is. 0002. This is strong evidence against H 0. In words: if C-sections are not associated with higher rates of stillbirth, then the probability of a result like the one we’ve seen is. 0002. We consider this strong evidence that C-sections are associated with higher rates of stillbirth.

NOTE: this method is approriate if

  • The populations are at least 10 times as large as the samples.
  • There are at least 5 failures and at least 5 successes in each sample.

Is gun ownership rising?

February 1999: Gallup polls 1134 adults finds 408 own a gun. October 2004: Gallup polls 1134 adlts finds 431 own a gun.

Is a newspaper correct to run headline “Percentage of gun owners rising?”

Let p 1 be percentage of population owning a gun in 1999, p 2 percentage in 2004. We wish to test the hypothesis that p 2 > p 1.

  • H 0 : p 1 = p 2 , Ha : p 2 > p 1.
  • Test statistic:
    • pooled proportion is pˆ = 408+431 2 · 1134 =. 370.
    • pˆ 1 = 408/1134 =. 360
    • pˆ 2 = 431/1134 =. 380

z =

pˆ 1 − pˆ 2 q p ˆ(1 − pˆ)( 11341 + 11341 )

=

. 36 −. 38

q

. 37 ·. 63 · 11342

= −. 986

  • Our P -value is about .162. This means that if the rate of gun ownership is the same, we would see an increase in our poll results at least this big about 16% of the time by chance.
  • So we do not reject H 0 , and the newspaper headline is not supported by the evidence.