Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An explanation of how to calculate confidence intervals and perform hypothesis testing for population proportions using examples. It covers the concepts of sampling to estimate a proportion, confidence intervals for the difference between two proportions, and hypothesis testing for population proportions. The document also discusses the sources of error in polls and the importance of the logic of hypothesis testing.
Typology: Study notes
1 / 8
Warm-up: let’s survey our class to produce a 5% confidence interval for the proportion of people who are left-handed.
Reminder:
ˆp ± z∗
ˆp(1 − ˆp) n
Such procedures are known as sampling to estimate a proportion.
The most widely-known venue for such such procedures is probably political polling.
What kinds of confidence intervals does one obtain by a random survey of five hundred “likely voters”? What about one thousand?
Further discussion: what are the “real” sources of error in polls one sees? What language would be more appropriate to describe such polling data?
Sometimes we would like to start with a desired level of error and then “work backwards” to design a survey or other study.
How many participants in a survey would be needed to yield a margin-of-error of ± 0 .5%?
A large poll conducted in 2007 showed that 39% of Americans favored universal health care. Recently a poll of 400 people found that 50% favored universal health care. Determine, based on this, if more people now favor universal health care. Let p be the (true) proportion of people who favor universal health care. (We’ll assume the 2007 poll is accurate so that the proportion of people then favoring universal health care is p 0 = 39%). So H 0 is “p = p 0 ”, and Ha is “p > p 0 ”. Our “null hypothesis” is that 39% of people now support universal health care. Our “test” is to determine if it is likely that more than 39% of people now support universal health care.
We pretend that H 0 is true (that is that p = p 0 ), and we calculate our z-statistic based on that:
z = ˆp − p 0 √ p 0 (1−p 0 ) n
0 .39(1− 0 .39) 400
We calculate our P-value.
P(Z ≥ 4 .51) ≈ 0. 00000324.
This P-value is: the probability of picking a sample of 400 so that at least 200 of them favor universal health care if in the population only 39% favor it.
Conclusion: If only 39% of people now support universal health care, then the probability of getting 50% in favor of universal health care by taking a random sample of 400 people would be only about 0.00000324. This is strong evidence that H 0 is false, and that more than 39% of people now support universal health care. So we reject the null hypothesis at almost any sensible significance level.
The logic of hypothesis testing has not changed from when you were using samples to measure a mean (or other statistic).
For an example in plain English, if a great hitter has a season-long slump, he probably isn’t a great hitter any more. (Translate this into hypothesis-testing language.)
Similarly, if a store claims to have the “best prices across the board”, but you shop for the exact same things there and elsewhere and find it cheaper elsewhere, then they probably aren’t cheapest.
Think of every-day hypothesis testing you might do during our mini-break.
We have two independent populations. We let p 1 and p 2 be the true population proportions. We let n 1 and n 2 be the sample sizes. We let ̂p 1 and ̂p 2 be the sample proportions.
We use the statistic ̂p 1 − ̂p 2.
The mean of its sampling distribution is p 1 − p 2 , so it is an unbiased estmator. (As in one sample problems, for confidence intervals a particular biased estimator is better.)
The standard deviation of its sampling distribution is √ p 1 (1 − p 1 ) n 1
p 2 (1 − p 2 ) n 2
The distribution is approximately normal for large samples.
Again, we don’t know the standard deviation, so we use a standard error: (^) √̂ p 1 (1 − ̂p 1 ) n 1
p 2 (1 − ̂p 2 ) n 2
The level C confidence interval for p 1 − p 2 is then
̂ p 1 − ̂p 2 ± z∗
p 1 (1 − ̂p 1 ) n 1
p 2 (1 − ̂p 2 ) n 2
Here z∗^ is the appropriate critical value, taken for example from Table C.
Both samples must have at least 10 successes and at least 10 failures.
In a SRS of 400 widgets from last month’s production at Wang’s Widgets Inc., 8 were found to be defective. In a SRS of 500 widgets from this month’s production, 6 were found to be defective. Did the production process improve?
Neither sample has at least 10 successes, so the large sample confidence interval can’t be used.
There is also a “plus four” method for confidence intervals for the difference between two proportions. It can be used under the following conditions: I (^) The population is much larger than the sample size. I (^) The sample sizes are at least 5. There is no restriction on the numbers of successes and failures in the samples.
For the “plus four” method, simply add four imaginary observations, one success and one failure, to the observed results for each sample. We write ˜p 1 and ˜p 2 for the sample proportions after these imaginary observations are added, and we use them everywhere that ̂p 1 and ̂p 2 are used in the large sample method.
As for one sample problems, in exam and homework problems, you must specify whether or not you are using the “plus four” method.
The book notes: The “plus four” confidence interval may be conservative (that is, give higher confidence than asked for) if the sample size is very small and the true population proportions are close to 0 or 1. However, it is much more accurate than the large sample method for small samples.
As with the one sample version, note that it uses a biased estimator.
In a SRS of 400 widgets from last month’s production at Wang’s Widgets Inc., 8 were found to be defective. In a SRS of 500 widgets from this month’s production, 6 were found to be defective. Let’s find a 95% confidence interval for the change in the proportion of defective widgets from last month to this month.
We aren’t told how many widgets per month are manufactured, but let’s assume it is many more than 500 per month. The sample sizes are certainly both at least 5. Also, we are told that both samples are SRSs. Therefore the “plus four” two sample confidence interval should be safe to use.
Last month, 8 out of 400 widgets were defective, and this month, 6 out of 500 widgets were defective. We want a 95% confidence interval for the change.
We get
˜p 1 =
and
˜p 2 =
Last month, 8 out of 400 widgets were defective, and this month, 6 out of 500 widgets were defective. We want a 95% confidence interval for the change.
The sampling standard error, computed including the imaginary observations, is √ ˜p 1 (1 − ˜p 1 ) n 1
˜p 2 (1 − ˜p 2 ) n 2
≈
Last month, 8 out of 400 widgets were defective, and this month, 6 out of 500 widgets were defective. We want a 95% confidence interval for the change.
We take z∗^ = 1.960 (from the “z∗” row of Table C). The confidence interval for the decrease in the proportion of defective widgets is
˜p 1 − ˜p 2 ± z∗^ (sampling standard error) ≈ 0. 0223881 − 0. 0139442 ± (1.960)(0.0104696) ≈ 0. 00844384 ± 0. 0205204.
In interval form, this is about
(− 0. 0120765 , 0 .0289642).
We conclude, with 95% confidence, that the change in the proportion of defective widgets is somewhere between an increase of about 1.21% of widget production and a decrease of about 2 .90% of widget production.
As before, we have two independent populations. We use the same notation: We let p 1 and p 2 be the true population proportions. We let n 1 and n 2 be the sample sizes. We let ̂p 1 and ̂p 2 be the sample proportions.
The null hypothesis is: H 0 : p 1 = p 2.
The alternate hypothesis is one of: Ha: p 1 6 = p 2. Ha: p 1 < p 2. Ha: p 1 > p 2.
You are working for Senator Snort’s opponent’s campaign. You choose a SRS of 250 registered voters. Of these, 149 say they intend to vote for Senator Snort. Then you run a series of negative ads featuring Senator Snort’s recent conviction for drunken driving. Now you take a new poll, with an independent sample of size 300, and find that 151 people in this new sample say they intend to vote for Senator Snort. Is Senator Snort’s support now smaller?
You are working for Senator Snort’s opponent’s campaign. You choose a SRS of 250 registered voters. Of these, 149 say they intend to vote for Senator Snort. Then you run a series of negative ads featuring Senator Snort’s recent conviction for drunken driving. Now you take a new poll, with an independent sample of size 300, and find that 151 people in this new sample say they intend to vote for Senator Snort. Is Senator Snort’s support now smaller?
Let p 1 be Senator Snort’s support before the ads, and let p 2 be Senator Snort’s support after the ads.
H 0 : p 1 = p 2.
Ha: p 1 > p 2.
We will use a standardized version of ̂p 1 − ̂p 2. (Important note: There is no “plus four” method here.)
What about the standard error? Recall that everything we calculate in a hypothesis test assumes that the null hypothesis is true. In this case, that means that both samples are effectively from the same population. Let p be the true proportion in this combined population. The null hypothesis says that p 1 = p 2 = p.
In this case, we get a better estimate of p by combining both samples. Accordingly, let ̂p be
number of successes in both samples combined number of individuals in both samples combined
This is called the pooled sample proportion. (Do not confuse it with the pooled sample method for comparing two means!)
Since we are assuming that the null hypothesis is true, we use ̂p in place of both ̂p 1 and ̂p 2 in the formula for the sampling standard error. This gives (^) √̂
p(1 − ̂p)
n 1
n 2
Accordingly, our test statistic is
z =
p 1 − ̂p 2 √̂ p(1 − ̂p)
1 n 1 +^
1 n 2
The distributions are sufficiently close to normal that we can use this test statistic whenever both samples have at least 5 successes and at least 5 failures.
Before your ads, 149 out of a sample of 250 said they would vote for Senator Snort. After your ads, 151 out of a sample of 300 said they would vote for Senator Snort. H 0 : p 1 = p 2 ; Ha: p 1 > p 2.
Let’s test at the significance level α = 0. 10.
Both samples have at least 5 successes and at least 5 failures. We discussed the other conditions previously.
We have
̂ p 1 =
= 0. 596 and ̂p 2 =
and ̂ p =
Before your ads, 149 out of a sample of 250 said they would vote for Senator Snort. After your ads, 151 out of a sample of 300 said they would vote for Senator Snort. H 0 : p 1 = p 2 ; Ha: p 1 > p 2 ; α = 0. 10.
The sampling standard error is √̂ p(1 − ̂p)
n 1
n 2
The test statistic is
z =
p 1 − ̂p 2 √̂ p(1 − ̂p)
1 n 1 +^
1 n 2
Before your ads, 149 out of a sample of 250 said they would vote for Senator Snort. After your ads, 151 out of a sample of 300 said they would vote for Senator Snort. H 0 : p 1 = p 2 ; Ha: p 1 > p 2 ; α = 0. 10.
We got z ≈ 2 .17323.
Since we are doing a 1-sided test, our P-value is the probability that z > 2 .17323. To find this, we can look up − 2 .17 in Table A, getting P(z < − 2 .17) ≈ 0 .0150. By symmetry, this is the P-value for our test. Since 0. 0150 < 0. 10 , we reject the null hypothesis. We conclude at significance α = 0.10 that there is good evidence that Senator Snort’s support decreased.
As discussed before, this is not necessarily evidence that the negative ads are responsible for the decrease.
Last month, 8 out of 400 widgets were defective. This month, it was 6 out of 500. H 0 : p 1 = p 2 ; Ha: p 1 6 = p 2.
Let’s test at the significance level α = 0. 02.
Both samples have at least 5 successes and at least 5 failures. We discussed the other conditions previously.
We have
̂ p 1 =
= 0. 02 and ̂p 2 =
and ̂ p =
In a SRS of 400 widgets from last month’s production at Wang’s Widgets Inc., 8 were found to be defective. In a SRS of 500 widgets from this month’s production, 6 were found to be defective. Did the production process change?
Let p 1 be the proportion of defective widgets among those manufactured last month, and let p 2 be the proportion of defective widgets among those manufactured this month.
H 0 : p 1 = p 2.
Ha: p 1 6 = p 2.
Last month, 8 out of 400 widgets were defective. This month, it was 6 out of 500. H 0 : p 1 = p 2 ; Ha: p 1 6 = p 2 ; α = 0. 02.
The sampling standard error is √̂ p(1 − ̂p)
n 1
n 2
The test statistic is
z =
p 1 − ̂p 2 √̂ p(1 − ̂p)
1 n 1 +^
1 n 2
Last month, 8 out of 400 widgets were defective. This month, it was 6 out of 500. H 0 : p 1 = p 2 ; Ha: p 1 6 = p 2 ; α = 0. 02.
We got z ≈ 0 .963708.
Since we are doing a 2-sided test, by symmetry our P-value is the twice the probability that z > 0 .963708. To find this, we can look up − 0 .96 in Table A, getting P(z < − 0 .96) ≈ 0 .1685. So the P-value for our test is about 2(0.1685) = 0.3370.
Since 0. 3370 > 0. 02 , we fail to reject the null hypothesis. There is insufficient evidence to conclude at significance level α = 0.02 that the proportion of defective widgets changed between last month and this month.