Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Solutions to practice problems related to calculating confidence intervals and testing hypotheses about population proportions and the difference between two proportions in statistics. The problems involve using z-scores and the plus-four method to find confidence intervals and determining if there is significant evidence that one proportion is greater than another. The document also discusses the concept of sampling error and the importance of having previous data.
Typology: Exams
1 / 7
Practice Problems 1 Ch. 20 – Inference about a Population Proportion............... 1 Ch. 21 – Comparing Two Proportions..................... 3 Ch. 4 – Scatterplots and Correlation...................... 5 Ch. 5 – Regression................................ 6
(i) As we know, these cannot be exact values (why not?). Using Gallup’s result as the sample proportion, find a 99% confidence interval for the proportion of registered voters who would vote for Barack Obama. (For this problem do not worry about using the plus 4 method. In general, always use the plus 4 method unless you are directed not to.)
Solution. First, the reason that these cannot be exact values is that they did not actually talk to the entire population (all registered voters). The best they can do is generate a confidence interval, which will have some margin of error. Let p be the proportion of registered voters who would vote for Barack Obama. We have ˆp = 0.46, n = 2682, and z∗ 99% ≈ 2 .576, so the confidence interval is
pˆ ± z 99%∗
pˆ(1 − pˆ) n
Therefore we are 99% sure that the proportion of registered voters who would vote for Barack Obama is between 0.4352 and 0.4848.
(ii) On their web site, Gallup (who uses a different method for their calculations than the basic method we know) states: “For results based on this sample of 2,682 registered voters, the maximum margin of sampling error is ±2 percentage points.” What do you think they mean by “sampling error”? Are there other errors we should worry about? What other piece of information is missing?
Solution. By “sampling error” they mean the inherent mathematical imprecision from using the rules of probability and sampling. Other errors we should worry about involve the way the sample was taken (by phone, which has undercoverage problems as previously discussed) as well as nonresponse and any other possible sources of bias. In fact, on their site Gallup states: “In addition to sampling error, question wording and practical difficulties in conducting surveys can introduce error or bias into the findings of public opinion polls.” The other crucial piece of information missing here is a confidence level. We do not know how sure of this interval they are. Is there a 1% chance they are wrong, or 5%, or perhaps 10%? It may be smaller or it may be larger, but we have no way of knowing.
,
(i) No previous data is available.
Solution. With no previous data available we must use p∗^ = 0.5 as our guess of the sample proportion. This gives us
n =
z∗ 98% m
p∗(1 − p∗) ≈
Thus we need a sample of at least 2165 UO students. ,
(ii) Previous data suggests that this proportion may be close to 35%.
Solution. This previous data suggests that we should guess p∗^ = 0.35, yielding
n =
z∗ 98% m
p∗(1 − p∗) ≈
Thus we need a sample of at least 1970 UO students.
,
Also answer the following question.
(iii) If it is possible to find a sample size that will give me the results I want without having any previous data, why is it desirable to have the previous data?
Solution. As illustrated above, a guess other than 0.5 reduces the sample size necessary. This would reduce the cost and effort required to actually take the sample and gather the data.
,
Solution. Let p be the proportion of Eugene firefighters who are registered paramedics. The sample proportion is ˆp = 1420 = 0.7, but it is not this proportion that we need. We really need the “+4” proportion, obtained by tacking on two successes and two failures to our sample, yielding ˜p = 1624 ≈ 0 .6667 (in the calculation below I will leave this as a fraction to reduce error caused by rounding multiple times). Since z∗ 99% ≈ 2 .576, the desired confidence interval is
p˜ ± z∗ 99%
p˜(1 − p˜) n + 4
16 24
Therefore we are 99% sure that the proportion of Eugene firefighters who are certified paramedics is between 0.4188 and 0.9146.
Solution. Let p 1 be the proportion of guitarists in rock bands that have blue eyes, and let p 2 be the proportion of drummers in rock bands that have blue eyes. The hypotheses are
H 0 : p 1 = p 2 Ha : p 1 > p 2.
We have ˆp 1 = 40/60 = 2/3 and ˆp 2 = 45/100 = 0.45. (I avoided rounding the first number as a decimal to cut down on rounding error. The fraction 2 / 3 cannot be expressed exactly as a finite decimal, but 45/100 and the pooled sample proportion 85/160 below can. Each time you round you introduce a little bit of error, and if this
is done multiple times throughout a problem the overall result can be a sizeable error.) The pooled sample proportion is
pˆ =
Thus the test statistic is
z =
pˆ 1 − pˆ 2 √ p ˆ(1 − pˆ)
1 n 1 +^
1 n 2
2 √^3 −^0.^45 0 .53125(1 − 0 .53125)
1 100
The P -value is P (Z > 2 .6588) = normalcdf(2.6588,10^99,0,1) ≈ 0. 0039. Since P ≈ 0. 0039 < 0 .005, we have significant evidence at the 0.005 level that a greater proportion of guitarists than drummers in rock bands have blue eyes.
,
Solution. Let p 1 be the proportion of 18- to 29-year-old registered voters who favor McCain, and let p 2 be the proportion of registered voters 65 years and older who favor McCain. We are given ˆp 1 = 0.35 and ˆp 2 = 0.46. To use the plus 4 method we will need to know the actual numbers of successes, x 1 and x 2. These are x 1 = ˆp 1 n 1 = (0.35)(500) = 175, and x 2 = ˆp 2 n 2 = (0.46)(300) = 138. To use the plus 4 method we add one success and one failure to each sample, so
p˜ 1 =
, and
p˜ 2 =
From Table C we get z∗ 96% ≈ 2 .054. Thus the desired confidence interval is
(˜p 1 − p˜ 2 ) ± z 96%∗
p ˜ 1 (1 − p˜ 1 ) n 1 + 2
p˜ 2 (1 − p˜ 2 ) n 2 + 2
176 502
139 302
Therefore we are 96% sure that the proportion of 18- to 29-year-old registered voters who favor McCain is between 0.0363 and 0.1831 less than the proportion of registered voters 65 years and older who favor McCain.
,
(a) Solution. Here is the scatterplot:
Note: From this scatterplot it is relatively clear that both sets of data are positively associated and very close to linear.
,
(b) Solution. The scatterplot clearly shows that icicles grow at a rate that is very close to linear. Since the lengths for Run 8905 are less than those for Run 8903, we can also observe that a slower rate of water flow results in increased icicle growth. However, we can only draw this conclusion for rates of water flow between the two used in the runs. We cannot say what would happen at rates slower than 11.9 mg/s or faster than 29.6 mg/s. Similarly, we cannot predict what the growth pattern would be between 0 and 10 minutes or after 240 minutes. Predictions such as these would be extrapolation, which generally cannot be trusted. ,
(c) Solution. For Run 8903 the correlation is r 8903 ≈ 0 .9958 (obtained by using LinReg(ax+b) on a calculator). This confirms that the data for Run 8903 is
positively associated and closely follows a linear relationship. For Run 8905 we have r 8905 ≈ 0 .9982. As above, this confirms that the data for Run 8905 is positively associated and closely follows a linear relationship.
,
Solution. 〈This was problem 3 of our Chapter 4 worksheet.〉
(i) Find the least-squares regression lines for the two sets of data.
Solution. For Run 8903 we compute (using LinReg(ax+b) on a calculator)
yˆ 8903 ≈ 0. 1585 x − 2. 3948.
For Run 8905 we get yˆ 8905 ≈ 0. 0911 x − 1. 4533. ,
(ii) Use these lines to predict the icicle length 55 minutes into each run.
Solution. We simply plug x = 55 into each equation. When x = 55 we get
yˆ 8903 ≈ 0 .1585(55) − 2. 3948 ≈ 6. 3227 , and yˆ 8905 ≈ 0 .0911(55) − 1. 4533 ≈ 3. 5572.
Thus we predict that 55 minutes into Run 8903 the icicle length is approximately 6.3327 centimeters, and 55 minutes into Run 8905 the icicle length is 3. centimeters.
,
(iii) Use these lines to predict the icicle length 350 minutes into each run.
Solution. When x = 350 we get
yˆ 8903 ≈ 0 .1585(350) − 2. 3948 ≈ 53. 0802 , and yˆ 8905 ≈ 0 .0911(350) − 1. 4533 ≈ 30. 4317.
Thus we predict that 350 minutes into Run 8903 the icicle length is approximately 53.0802 centimeters, and 350 minutes into Run 8905 the icicle length is 30. centimeters.
(iv) What is the difference between your predictions in parts (ii) and (iii)? Do you trust one prediction more than the other? If so, which one?
Solution. The predictions in part (ii) are interpolations, because they lie within the x range of the data. In contrast, the predictions in part (iii) are extrapolations, because they lie outside the x range of the data. We should trust the predictions of part (ii) more, because interpolations are generally more reliable than extrapolations. When we do not have data in a certain range, we cannot really predict whether an observed pattern, such as the close to straight line relationships observed here, will continue. Certainly at some point the icicles must stop growing, as they will eventually run out of room.
,