Download Statistical Analysis and Regression in Quantitative Methods and more Exams Introduction to Public Administration in PDF only on Docsity! PUAF 610–1 QUANTITATIVE METHODS Fall 1995 FINAL EXAM SOLUTIONS 1. Randomized response is sometimes used to guarantee confidentiality. For example, to investigate what proportion of doctors have practiced euthanasia, doctors could be asked to go into their office, flip a coin, and (a) if tails, answer the question “Have you ever practiced euthanasia?” (b) if heads, flip it again and answer “Did it come up heads twice?” If the doctor answers “Yes,” the interviewer cannot know whether the doctor practiced euthanasia or whether he just flipped two heads. A. Suppose that 20 percent of doctors practice euthanasia. What proportion would you expect to answer “Yes”? (5 points) P(yes) = P(yes|heads)⋅P(heads) + P(yes|tails)⋅P(tails) = P(heads)⋅P(heads) + P(practice euthanasia)⋅P(tails) = [P(H)]2 + P(E)⋅[1 – P(H)] = (0.5)2 + (0.2)(0.5) = 0.25 + 0.1 = 0.35 B. In an actual survey of 100 doctors, 40 answer “Yes.” What is your best estimate of the proportion that practice euthanasia? (5 points) P E P Y P H P H ( ) ( ) ( ) ( ) . . . . . .= − − = − = = 2 1 0 4 0 25 0 5 0 15 0 5 0 3 C. As noted above, some doctors will answer “Yes” because they will flip two heads. If 100 doctors are interviewed, give the 95-percent confidence interval for the number who will flip two heads. (5 points) µ = Np = 100⋅[P(H)]2 = 100⋅(0.5)2 = 25 σ = − = =Np p( ) ( . )( . ) .1 100 0 25 0 75 4 33 µ ± 1.96σ = 25 ± 1.96(4.33) = 25 ± 8.5 2. A study of 3,224 soldiers who had observed atmospheric nuclear tests noted that 122 had died from cancer. The probability of cancer death for U.S. men with similar ages and smoking habits is 3.65 percent. Are observers of nuclear tests more likely to die of cancer? (5 points) Ho: p = po; HA: p ≠ po (two-tailed); Np > 5, so normal approximation is ok $ . ( ) . ( . ) . $ . . . . . . $ $ p p p N z p p p p = = = − = = = − = − = = 122 3224 0 03784 1 0 0365 0 9635 3224 0 00330 0 0378 0 0365 0 0033 0 0013 0 0033 0 40 σ σ This data does not represent statistical evidence of a link between observing nuclear tests and cancer: fail to reject the null hypothesis. 3. M.V. Lee Badgett examined the incomes of 47 gay/bisexual and 901 heterosexual men. The mean income of the former group was $26,321 (s = $16,937), while that of the latter group was $28,312 (s = $16,842). A. Formulate null and alternative hypotheses, and test the null hypothesis. State your conclusion in plain English. (7 points) Ho: µ1 = µ2; HA: µ1 ≠ µ2 (two-tailed); N1, N2 > 30, so normal approx. is ok ( ) ( ) z x x x x s N s N x x = − = − + = − + = − = − − 1 2 1 2 1 2 1 2 2 2 2 2 1 2 26321 28312 16937 47 16842 901 1991 2533 0 786 σ . There is no evidence that gays have higher or lower incomes than heterosexuals: fail to reject the null hypothesis. B. Why is this analysis inadequate to answer the question of whether gay men have higher or lower incomes than heterosexual men? (3 points) The analysis ignores many variables that we know are correlated with income, and which may be correlated with sexual preference, including age, education, work experience, occupation, geography, etc. After accounting for such factors in a multiple regression analysis, Badgett concluded that the income difference was statistically significant. 4. A large study of child care used samples from the Current Population Survey, which has three categories of child-care workers: private household, non-household, and preschool teacher. The following table gives the number of blacks in a sample of women workers: EXPECTED Poll Bush Clinton Perot undecided total ABC News 334 385 139 54 912 USA Today/CNN 590 680 246 95 1610 New York Times/CBS 701 807 292 112 1912 total 1625 1872 676 261 4434 (O -E)^2/E Poll Bush Clinton Perot undecided total ABC News 0.46 0.32 4.51 12.88 18.17 USA Today/CNN 2.45 0.02 1.66 2.13 6.26 New York Times/CBS 3.63 0.27 0.08 14.58 18.56 total 6.55 0.62 6.25 29.59 43.00 The very large chi-square value of 43 (p ≈ 10–7) means that the differences between the polls are not due to random sampling variations. The critical value of chi-square for 6 degrees of freedom and α = 0.05 is 12.59. Note that most of the contribution to chi-square comes from the undecided category. Obviously the different pollsters used different techniques to try to persuade people to express their preference, with ABC being the most successful by far, as indicated by the accuracy test. 6. The following table gives the mass (in metric tonnes) and unit flyaway cost (in millions of FY87 dollars) of seven U.S. missiles. Also given are the squared differences between the mass and the mean mass (SSm) and the cost and the mean cost (SSc), and the product of two (SSmc). Missile Mass (te) Unit Flyaway Cost (M$) SSm SSc SSmc Minuteman III 35.0 7.80 0.1 7.3 1.0 Peacekeeper 88.0 22.00 2768.3 132.1 604.6 Poseidon C3 29.0 5.00 40.8 30.3 35.2 Trident C4 30.0 8.10 29.0 5.8 13.0 Trident D5 57.0 28.00 467.2 306.0 378.1 Lance 1.3 0.16 1161.8 107.1 352.7 Pershing II 7.4 2.50 783.2 64.1 224.1 mean/sum 35.4 10.51 5250.4 652.7 1608.7 A. Which is the “independent” variable? Why? Write an equation for a linear relationship between the mass M and the cost C. (4 points) C = a + bM; M is the independent variable, because cost depends on mass. B. Compute estimates of the slope and intercept. Briefly explain the physical significance of each. (4 points) b SS SS SS SS a C bM xy x mc m = = = = = − = − = − 1608 7 5250 4 0 3064 10 51 0 3064 35 4 0 336 . . . . ( . ) . . b: if mass increases by 1 te, cost increases by $310,000. a: if mass is zero, cost is –$336,000 (makes no physical sense). C. Compute the coefficient of determination (r2) and explain its physical significance. (4 points) ( )( ) r SS SS SS mc m c 2 2 21608 7 5250 4 652 7 0 755= = = ( ) ( . ) . . . 76 percent of the variation in the cost of U.S. missiles can be explained by a linear relationship with mass. D. Is there a relationship between the mass and the cost of missiles? Test the null hypothesis that there is no correlation. (4 points) t r s r r N r = = − − = − − = = 1 2 0 755 1 0 755 7 2 0 869 0 221 3 93 2 . . . . . The critical value of t for 5 degrees of freedom and α = 0.05 is 2.57, so we reject the null hypothesis of no correlation between mass and cost. E. The Soviet-built Scud-B missile has a mass of 4.9 tonnes. Estimate its cost, and explain why the estimate might not be accurate. (2 points) C = a + bM = –0.336 + 0.3064(4.9) = $1.16 million This might not be accurate primarily because the linear relationship was developed for U.S. missiles, not Soviet missiles. F. The Scud sells for about $1 million. Is this consistent with the linear relationship derived above? (2 points) The short answer is “yes.” You could have calculated sy (≈6.5), but that wasn’t necessary. This was a free two points. G. Below is a scatterplot plot and a plot of the residuals. Label the axes appropriately. Do you notice anything interesting—outliers, heteroscedasticity, or autocorrelation? (5 points) -12 -8 -4 0 4 8 12 R es id ua ls 0 20 40 60 80 100 Missile Mass (te) 0 10 20 30 M is si le C os t ( M $) 0 20 40 60 80 100 Missile Mass (te) The data point for the Trident D5 missile is clearly an outlier. If this is removed, the regression improves dramatically. There are no other problems. 7. Data for the following variables were collected for 117 metropolitan areas: M = mortality rate (deaths per 10,000 per year); P = average suspended particulate concentration (µg/m3); S = average sulfate concentration (µg/m3); B = percentage of population that is black; E = percentage of population over 65 years of age; N = location of city (1 = north, 0 = south). A regression analysis yielded the following results: Regression Statistics Multiple R 0.982 R Square 0.964 Adjusted R Square 0.943 Standard Error 1.189 Observations 117 Coefficient Std Error t-statistic P-value Intercept 19.6 2.126 9.22 5E-09 P 0.041 0.013 3.18 0.004 S 0.71 0.280 2.54 0.019 B 0.41 0.070 5.82 7E-06 E 6.87 0.363 18.90 4E-15 N 3.04 2.596 1.17 0.255 A. Explain the meaning of each number in the column labeled “coefficient.” Do the numerical values make sense? (5 points)