Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Confidence Intervals & Hypothesis Testing for Proportions & Differences in Statistics - Pr, Exams of Probability and Statistics

Solutions to practice problems related to calculating confidence intervals and testing hypotheses about population proportions and the difference between two proportions in statistics. The problems involve using z-scores and the plus-four method to find confidence intervals and determining if there is significant evidence that one proportion is greater than another. The document also discusses the concept of sampling error and the importance of having previous data.

Typology: Exams

Pre 2010

Uploaded on 09/17/2009

koofers-user-d82
koofers-user-d82 🇺🇸

10 documents

1 / 7

Toggle sidebar

Related documents


Partial preview of the text

Download Confidence Intervals & Hypothesis Testing for Proportions & Differences in Statistics - Pr and more Exams Probability and Statistics in PDF only on Docsity!

Math 243, Summer 2009

Final Practice Problem Solutions

Instructor: Dan Westerman

August 9, 2009

Contents

Practice Problems 1 Ch. 20 – Inference about a Population Proportion............... 1 Ch. 21 – Comparing Two Proportions..................... 3 Ch. 4 – Scatterplots and Correlation...................... 5 Ch. 5 – Regression................................ 6

Practice Problems

Ch. 20 – Inference about a Population Proportion

  1. A 2008 gallup poll of 2,682 registered voters found that 46% would vote for Barack Obama, and 42% would vote for John McCain.

(i) As we know, these cannot be exact values (why not?). Using Gallup’s result as the sample proportion, find a 99% confidence interval for the proportion of registered voters who would vote for Barack Obama. (For this problem do not worry about using the plus 4 method. In general, always use the plus 4 method unless you are directed not to.)

Solution. First, the reason that these cannot be exact values is that they did not actually talk to the entire population (all registered voters). The best they can do is generate a confidence interval, which will have some margin of error. Let p be the proportion of registered voters who would vote for Barack Obama. We have ˆp = 0.46, n = 2682, and z∗ 99% ≈ 2 .576, so the confidence interval is

pˆ ± z 99%∗

pˆ(1 − pˆ) n

≈ 0. 46 ± 2. 576

0 .46(1 − 0 .46)

≈ 0. 46 ± 0. 0248.

Therefore we are 99% sure that the proportion of registered voters who would vote for Barack Obama is between 0.4352 and 0.4848.

,

(ii) On their web site, Gallup (who uses a different method for their calculations than the basic method we know) states: “For results based on this sample of 2,682 registered voters, the maximum margin of sampling error is ±2 percentage points.” What do you think they mean by “sampling error”? Are there other errors we should worry about? What other piece of information is missing?

Solution. By “sampling error” they mean the inherent mathematical imprecision from using the rules of probability and sampling. Other errors we should worry about involve the way the sample was taken (by phone, which has undercoverage problems as previously discussed) as well as nonresponse and any other possible sources of bias. In fact, on their site Gallup states: “In addition to sampling error, question wording and practical difficulties in conducting surveys can introduce error or bias into the findings of public opinion polls.” The other crucial piece of information missing here is a confidence level. We do not know how sure of this interval they are. Is there a 1% chance they are wrong, or 5%, or perhaps 10%? It may be smaller or it may be larger, but we have no way of knowing.

,

  1. I wish to estimate the proportion of UO students who work at least 20 hours per week. I desire a confidence level of 98% and a margin of error of no more than 0.025. Find the sample size required in each of the following situations.

(i) No previous data is available.

Solution. With no previous data available we must use p∗^ = 0.5 as our guess of the sample proportion. This gives us

n =

(

z∗ 98% m

) 2

p∗(1 − p∗) ≈

(

) 2

0 .5(1 − 0 .5) = 2164. 1104.

Thus we need a sample of at least 2165 UO students. ,

(ii) Previous data suggests that this proportion may be close to 35%.

Solution. This previous data suggests that we should guess p∗^ = 0.35, yielding

n =

(

z∗ 98% m

) 2

p∗(1 − p∗) ≈

(

) 2

0 .35(1 − 0 .35) ≈ 1969. 3405.

Thus we need a sample of at least 1970 UO students.

,

Also answer the following question.

(iii) If it is possible to find a sample size that will give me the results I want without having any previous data, why is it desirable to have the previous data?

Solution. As illustrated above, a guess other than 0.5 reduces the sample size necessary. This would reduce the cost and effort required to actually take the sample and gather the data.

,

  1. I take a simple random sample of 20 Eugene firefighters, and find that 14 of them are certified paramedics. Find a 99% confidence interval for the proportion of Eugene firefighters that are certified paramedics.

Solution. Let p be the proportion of Eugene firefighters who are registered paramedics. The sample proportion is ˆp = 1420 = 0.7, but it is not this proportion that we need. We really need the “+4” proportion, obtained by tacking on two successes and two failures to our sample, yielding ˜p = 1624 ≈ 0 .6667 (in the calculation below I will leave this as a fraction to reduce error caused by rounding multiple times). Since z∗ 99% ≈ 2 .576, the desired confidence interval is

p˜ ± z∗ 99%

p˜(1 − p˜) n + 4

± 2. 576

16 24

(

1 − 1624

)

± 0. 2479.

Therefore we are 99% sure that the proportion of Eugene firefighters who are certified paramedics is between 0.4188 and 0.9146.

,

Ch. 21 – Comparing Two Proportions

  1. A SRS of 60 guitarists in rock bands contains 40 people with blue eyes. A SRS of 100 drummers in rock bands contains 45 people with blue eyes. Does this provide significant evidence that a greater proportion of guitarists than drummers in rock bands have blue eyes?

Solution. Let p 1 be the proportion of guitarists in rock bands that have blue eyes, and let p 2 be the proportion of drummers in rock bands that have blue eyes. The hypotheses are

H 0 : p 1 = p 2 Ha : p 1 > p 2.

We have ˆp 1 = 40/60 = 2/3 and ˆp 2 = 45/100 = 0.45. (I avoided rounding the first number as a decimal to cut down on rounding error. The fraction 2 / 3 cannot be expressed exactly as a finite decimal, but 45/100 and the pooled sample proportion 85/160 below can. Each time you round you introduce a little bit of error, and if this

is done multiple times throughout a problem the overall result can be a sizeable error.) The pooled sample proportion is

pˆ =

40 + 45

60 + 100

=

= 0. 53125.

Thus the test statistic is

z =

pˆ 1 − pˆ 2 √ p ˆ(1 − pˆ)

(

1 n 1 +^

1 n 2

) =

2 √^3 −^0.^45 0 .53125(1 − 0 .53125)

( 1

60 +^

1 100

) ≈^2.^6588.

The P -value is P (Z > 2 .6588) = normalcdf(2.6588,10^99,0,1) ≈ 0. 0039. Since P ≈ 0. 0039 < 0 .005, we have significant evidence at the 0.005 level that a greater proportion of guitarists than drummers in rock bands have blue eyes.

,

  1. A 2008 Gallup poll showed that John McCain is favored by 35% of 18-29 year-old registered voters, and is favored by 46% of registered voters 65 years and older. Suppose that these are based on simple random samples sizes 500 and 300, respectively. Find a 96% confidence interval for the difference in support for McCain between these two age groups.

Solution. Let p 1 be the proportion of 18- to 29-year-old registered voters who favor McCain, and let p 2 be the proportion of registered voters 65 years and older who favor McCain. We are given ˆp 1 = 0.35 and ˆp 2 = 0.46. To use the plus 4 method we will need to know the actual numbers of successes, x 1 and x 2. These are x 1 = ˆp 1 n 1 = (0.35)(500) = 175, and x 2 = ˆp 2 n 2 = (0.46)(300) = 138. To use the plus 4 method we add one success and one failure to each sample, so

p˜ 1 =

175 + 1

500 + 2

=

, and

p˜ 2 =

138 + 1

300 + 2

=

.

From Table C we get z∗ 96% ≈ 2 .054. Thus the desired confidence interval is

(˜p 1 − p˜ 2 ) ± z 96%∗

p ˜ 1 (1 − p˜ 1 ) n 1 + 2

+

p˜ 2 (1 − p˜ 2 ) n 2 + 2

(

)

± 2. 054

176 502

(

1 − 176502

)

+

139 302

(

1 − 139302

)

≈ − 0. 1097 ± 0. 0734.

Therefore we are 96% sure that the proportion of 18- to 29-year-old registered voters who favor McCain is between 0.0363 and 0.1831 less than the proportion of registered voters 65 years and older who favor McCain.

,

Ch. 4 – Scatterplots and Correlation

  1. Exercise 4.7 from the textbook. Also do this: (c) Calculate the correlation for each set of data. What is your interpretation of these correlations?

(a) Solution. Here is the scatterplot:

Note: From this scatterplot it is relatively clear that both sets of data are positively associated and very close to linear.

,

(b) Solution. The scatterplot clearly shows that icicles grow at a rate that is very close to linear. Since the lengths for Run 8905 are less than those for Run 8903, we can also observe that a slower rate of water flow results in increased icicle growth. However, we can only draw this conclusion for rates of water flow between the two used in the runs. We cannot say what would happen at rates slower than 11.9 mg/s or faster than 29.6 mg/s. Similarly, we cannot predict what the growth pattern would be between 0 and 10 minutes or after 240 minutes. Predictions such as these would be extrapolation, which generally cannot be trusted. ,

(c) Solution. For Run 8903 the correlation is r 8903 ≈ 0 .9958 (obtained by using LinReg(ax+b) on a calculator). This confirms that the data for Run 8903 is

positively associated and closely follows a linear relationship. For Run 8905 we have r 8905 ≈ 0 .9982. As above, this confirms that the data for Run 8905 is positively associated and closely follows a linear relationship.

,

  1. Exercise 4.11 from the textbook.

Solution. 〈This was problem 3 of our Chapter 4 worksheet.〉

,

Ch. 5 – Regression

  1. This problem refers to Exercise 4.7.

(i) Find the least-squares regression lines for the two sets of data.

Solution. For Run 8903 we compute (using LinReg(ax+b) on a calculator)

yˆ 8903 ≈ 0. 1585 x − 2. 3948.

For Run 8905 we get yˆ 8905 ≈ 0. 0911 x − 1. 4533. ,

(ii) Use these lines to predict the icicle length 55 minutes into each run.

Solution. We simply plug x = 55 into each equation. When x = 55 we get

yˆ 8903 ≈ 0 .1585(55) − 2. 3948 ≈ 6. 3227 , and yˆ 8905 ≈ 0 .0911(55) − 1. 4533 ≈ 3. 5572.

Thus we predict that 55 minutes into Run 8903 the icicle length is approximately 6.3327 centimeters, and 55 minutes into Run 8905 the icicle length is 3. centimeters.

,

(iii) Use these lines to predict the icicle length 350 minutes into each run.

Solution. When x = 350 we get

yˆ 8903 ≈ 0 .1585(350) − 2. 3948 ≈ 53. 0802 , and yˆ 8905 ≈ 0 .0911(350) − 1. 4533 ≈ 30. 4317.

Thus we predict that 350 minutes into Run 8903 the icicle length is approximately 53.0802 centimeters, and 350 minutes into Run 8905 the icicle length is 30. centimeters.

,

(iv) What is the difference between your predictions in parts (ii) and (iii)? Do you trust one prediction more than the other? If so, which one?

Solution. The predictions in part (ii) are interpolations, because they lie within the x range of the data. In contrast, the predictions in part (iii) are extrapolations, because they lie outside the x range of the data. We should trust the predictions of part (ii) more, because interpolations are generally more reliable than extrapolations. When we do not have data in a certain range, we cannot really predict whether an observed pattern, such as the close to straight line relationships observed here, will continue. Certainly at some point the icicles must stop growing, as they will eventually run out of room.

,