









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The results of a regression analysis examining the relationship between API scores, class size, year round schools, and the percentage of parents as high school graduates. the regression statistics, coefficients, and summary output. The analysis reveals that class size, year round schools, and the percentage of parents as high school graduates are significant predictors of API scores.
Typology: Lecture notes
1 / 16
This page cannot be seen from the preview
Don't miss anything!










Ecn 102 - Analysis of Economic Data University of California - Davis March 17, 2010 Instructor: John Parman
You have until 12:30pm to complete this exam. Please remember to put your name, section and ID number on both your scantron sheet and the exam. Fill in test form A on the scantron sheet. Answer all multiple choice questions on your scantron sheet. Choose the single best answer for each multiple choice question. Answer the long answer questions directly on the exam. Keep your answers complete but concise. For the long answer questions, you must show your work where appropriate for full credit.
Name: ID Number: Section:
x¯ = (^1) n
∑n i=1 xi
s^2 = (^) n^1 − 1
∑n i=1(xi^ −^ x¯) 2
CV = (^) xs ¯
skew = (^) (n−1)(nn−2)
∑n i=1(^
xi−¯x s ) 3
kurt = (^) (n−1)(n(nn−+1)2)(n−3)
∑n i=1(^
xi−x¯ s ) (^4) − 3(n−1)^2 (n−2)(n−3)
μ = E(X)
z∗^ = x¯√− σμ n
t∗^ = x¯√− sμ n
t∗^ = bj s^ −bβj j
P r[Tn−k > tα,n−k] = α
P r[|Tn−k| > t α 2 ,n−k] = α
∑n i=1 a^ =^ na ∑n i=1(axi) =^ a^
∑n i=1 xi ∑n i=1(xi^ +^ yi) =^
∑n i=1 xi^ +^
∑n i=1 yi
s^2 = ¯x(1 − ¯x) for proportions data
tα,n−k = T IN V (2α, n − k)
P r(|Tn−k| ≥ |t∗|) = T DIST (|t∗|, n − k, 2)
P r(Tn−k > t∗) = T DIST (t∗, n − k, 1)
sxy = (^) n^1 − 1
∑n i=1(xi^ −^ x¯)(yi^ −^ y¯)
rxy =
∑n √∑n i=1(xi−x¯)(yi−y¯) i=1(xi−¯x)^2 ·
∑n i=1(yi−¯y)^2 rxy = √ssxxxy·syy
b 2 =
∑n i∑=1n(xi−x¯)(yi−¯y) i=1(xi−¯x)^2 = rxy
syy sxx
b 1 = ¯y − b 2 x¯
y ˆi = b 1 +
∑k j=2 bj^ xj,i
s^2 e = (^) n−^1 k
∑n i=1(yi^ −^ yˆi) 2
T SS =
∑n i=1(yi^ −^ y¯)
2
ESS =
∑n i=1(yi^ −^ yˆi) 2
R^2 = 1 − ESST SS
sb 2 =
∑n s^2 e i=1(xi−x¯)^2 R^ ¯^2 = 1 − n−^1 n−k
ESS T SS
F ∗^ = nk−−k 1 R 2 1 −R^2
F ∗^ = nk−−kgESS ESSr^ −ESSu u= nk−−kgR
(^2) u−R (^2) r 1 −R^2 u P r(Fk−g,n−k > F ∗) = F DIST (F ∗, k − g, n − k)
SECTION I: MULTIPLE CHOICE (60 points)
(b) Measurement error in X would bias the slope coefficient toward zero. Omitting the variable Z would bias the coefficient in a direction that depends on the signs of the correlations between Z and X and between Z and Y. Measurement error in Y would not bias the coefficient, it would just lead to a larger standard error for the coefficient.
(c) A histogram shows the distribution of a single variable.
(d) The estimator is biased because its expected value is not equal to βj. It is not consistent because as the sample size increases, even though the standard error of the estimator approaches zero the value of the estimator does not approach βj.
score
hours
Use the figure above to answers questions 7 through 10. The figure is a scatter plot with hours of study on the horizontal axis and final exam score on the vertical axis for 21 students in an ECN 100 class. The round data points correspond to economics majors. The square data points correspond to nonmajors. Suppose we use this data to estimate the following model:
SCORE = β 1 + β 2 M AJOR + β 3 HOU RS + β 4 M AJOR · HOU RS + ε
where SCORE is a student’s final exam score, M AJOR is a dummy variable equal to one if the student is an economics major and zero otherwise, HOU RS is the number of hours the student studies for the final and ε is a random error term that satisfies all of our assumptions.
(a) Positive. (b) Negative. (c) Larger for economics majors than nonmajors. (d) Larger for nonmajors than economics majors. (a) β 2 will be the difference in exam scores between economics majors and non- majors when hours of study are zero. On the graph, this is the difference between the vertical intercept for majors and the vertical intercept for nonmajors. From the graph, it is clear that the vertical intercept for majors is much larger than the vertical intercept for nonmajors, so we would expect β 2 to be positive.
(a) Positive. (b) Negative. (c) Larger for economics majors than nonmajors. (d) Larger for nonmajors than economics majors. (b) β 4 is the difference between majors and nonmajors in the change in exam score from an extra hour of studying. On the graph, this is the difference in the slopes of the lines passing through the majors data points and the nonmajors data points.
From the graph, it is clear that slope is flatter for the majors than for the nonmajors. So we would expect β 4 to be negative.
(c) Since the p-value is smaller than 0.10, we would conclude that the coefficient is statistically significant at a 10% significance level. Without knowing the magnitude of the coefficient and what the coefficient is measuring, we have no way of saying whether the coefficient is economically significant.
(a) The R^2. (b) The adjusted R^2. (c) The standard errors of the coefficients for the other variables. (d) The total sum of squares.
(b) The R^2 will not decrease (it will likely stay the same since the new variable will not explain any of the variation in the dependent variable). The adjusted R^2 will tend to decrease since it takes into account not only the R^2 but also the number of variables used. The standard errors of the other coefficients will tend to increase when you add an irrelevant variable to the regression.
(a) A slope coefficient equal to one. (b) A slope coefficient equal to either one or negative one. (c) An R^2 equal to one. (d) An intercept equal to zero.
(c) Knowing that X and Y are perfectly correlated tells us the the data points will all lie along a straight line, giving us an R^2 value of one. However, it does not tell us what the slope of that line is or whether it has a nonzero intercept.
(a) Be centered at zero. (b) Be centered at the population mean of X. (c) Have a variance equal the variance of X. (d) Both (b) and (c).
(b) The distribution of the sample mean will always be centered at the true popu- lation mean of X. The variance of the distribution of the sample mean will depend on both the variance of X and on the sample size.
Y = β 1 + β 2 X + ε
SECTION II: SHORT ANSWER (40 points)
Here the problem is the measurement of hours of exercise. Since the student does not remember the exact value for each week, this variable will be measured with some random error. Random measurement error in the independent variable will bias the coefficient on that variable toward zero. If the effect of exercise on quiz score is positive, this would mean there is a downward bias. If the effect of exercise on quiz score is negative, it would mean there is an upward bias.
β 1 is the electricity used by a household with no income (I = 0) at a temperature of zero degrees Celcius (Tc = 0). This certainly will not be a negative number (the household cannot use negative amounts of electricity). Most likely, it will be a positive number since even if a family doesn’t have a current source of income they will likely still need to use some electricity.
The signs of β 2 and β 3 need to give us the U-shape originally described in the model. Since the U-shape faces upward β 3 must be positive. Another way to see this is that the slope of the curve would be equal to β 2 + 2β 3 Tc and the slope gets more positive as Tc gets larger, so β 3 must be positive.
Note that the minimum of this curve will likely occur at a temperature well above zero degrees since it starts rising again because of air conditioner usage. The minimum of the curve is where the slope is zero, implying that β 2 = − 2 β 3 Tc. Since the minimum occurs at a positive Tc and we just determined that β 3 is positive, this implies that β 2 should be negative.
Finally, we are told in part (b) that higher income families use more electricity, so β 4 should be positive.
SUMMARY OUTPUT: API score as dependent variable
Regression Statistics Multiple R 0. R Square 0. Adjusted R Square 0. Standard Error 76. Observations 6426
Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 688.49 5.93 116.09 0 676.86 700. CLASSSIZE 3.91 0.21 18.77 1.41E‐ 76 3.50 4. YEARROUND ‐21.82 4.90 ‐4.46 8.5E‐ 06 ‐31.43 ‐12. NONHSGRAD ‐88.76 3.56 ‐24.94 4.1E‐ 131 ‐95.73 ‐81.
SUMMARY OUTPUT: API score as dependent variable
Regression Statistics Multiple R 0. R Square 0. Adjusted R Square 0. Standard Error 80. Observations 6426
Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 671.78 6.19 108.53 0 659.65 683. CLASSSIZE 4.22 0.22 19.33 5.42E‐ 81 3.80 4.
SUMMARY OUTPUT: API score as dependent variable
Regression Statistics Multiple R 0. R Square 0. Adjusted R Square 0. Standard Error 79. Observations 6426
Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 798.17 1.04 765.51 0 796.13 800. YEARROUND ‐23.39 5.03 ‐4.65 3.38E‐ 06 ‐33.25 ‐13. NONHSGRAD ‐92.40 3.65 ‐25.32 7.2E‐ 135 ‐99.55 ‐85.
Variable Mean Minimum Maximum (^) DeviationStandard API 789.854 310 998 83. CLASSSIZE 27.949^1 50 4. YEARROUND 0.040 0 1 0. NONHSGRAD 0.080 0 1 0.
Summary Statistics for the Regression Sample
API scores is:
AP Î A − AP Î B = 783. 97 − 677 .93 = 106. 04
So the predicted API score in school district A is 106.04 points higher than the predicted API score in district B.
(b) Suppose you wanted to use an F test to determine whether the Y EARROU N D and N ON HSGRAD variables are jointly significant in the regression that includes CLASSSIZE, Y EARROU N D and N ON HSGRAD. In other words, you want to test the following set of hypotheses: H 0 : βY EARROU N D = βN ON HSGRAD = 0 Ha: at least one of βY EARROU N D andβHSGRAD is not equal to zero Calculate the F statistic you would use to do this test. You should calculate an exact numerical value for the F statistic. We are testing the joint significance of Y EARROU N D and N ON HSGRAD in the regression that contains all three variables. So our unrestricted model is:
AP I = β 1 + β 2 CLASSSIZE + β 3 Y EARROU N D + β 4 N ON HSGRAD + ε
Notice that there are four variables in this equation, so k is equal to four. Our restricted model excludes the variables that we are testing the joint significance of: AP I = β 1 + β 2 CLASSSIZE + ε Notice that there are only two variables left in the equation, so g is equal to two. The regression results for the unrestricted model are given in the first set of results on the previous page. The regression results for the restricted model are given in the second set of results. We do not need the third set of regression results for this particular question. Given the regression results, the calculation of the test statistic is straightforward:
n − k k − g
R^2 u − R^2 r 1 − R^2 u