Predicting API Scores with Class Size, Year Round Schools, and Parental Graduation Rate, Lecture notes of Economics

The results of a regression analysis examining the relationship between API scores, class size, year round schools, and the percentage of parents as high school graduates. the regression statistics, coefficients, and summary output. The analysis reveals that class size, year round schools, and the percentage of parents as high school graduates are significant predictors of API scores.

Typology: Lecture notes

2021/2022

Uploaded on 08/05/2022

jacqueline_nel
jacqueline_nel 🇧🇪

4.4

(242)

3.2K documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Ecn 102 - Analysis of Economic Data
University of California - Davis March 17, 2010
Instructor: John Parman
Final Exam - Solutions
You have until 12:30pm to complete this exam. Please remember to put your name, section and
ID number on both your scantron sheet and the exam. Fill in test form A on the scantron sheet.
Answer all multiple choice questions on your scantron sheet. Choose the single best answer for
each multiple choice question. Answer the long answer questions directly on the exam. Keep your
answers complete but concise. For the long answer questions, you must show your work where
appropriate for full credit.
Name: ID Number: Section:
(POTENTIALLY) USEFUL FORMULAS
¯x=1
nPn
i=1 xi
s2=1
n1Pn
i=1(xi¯x)2
CV =s
¯x
skew =n
(n1)(n2) Pn
i=1(xi¯x
s)3
kurt =n(n+1)
(n1)(n2)(n3) Pn
i=1(xi¯x
s)43(n1)2
(n2)(n3)
µ=E(X)
z=¯xµ
σ
n
t=¯xµ
s
n
t=bjβj
sbj
P r[Tnk> tα,nk] = α
P r[|Tnk|> t α
2,nk] = α
Pn
i=1 a=na
Pn
i=1(axi) = aPn
i=1 xi
Pn
i=1(xi+yi) = Pn
i=1 xi+Pn
i=1 yi
s2= ¯x(1 ¯x) for proportions data
tα,nk=T IN V (2α, n k)
P r(|Tnk|≥|t|) = T DI ST (|t|, n k , 2)
P r(Tnk> t) = T DI ST (t, n k, 1)
sxy =1
n1Pn
i=1(xi¯x)(yi¯y)
rxy =Pn
i=1(xi¯x)(yi¯y)
Pn
i=1(xi¯x)2·Pn
i=1(yi¯y)2
rxy =sxy
sxx·syy
b2=Pn
i=1(xi¯x)(yi¯y)
Pn
i=1(xi¯x)2=rxy qsyy
sxx
b1= ¯yb2¯x
ˆyi=b1+Pk
j=2 bjxj,i
s2
e=1
nkPn
i=1(yiˆyi)2
T SS =Pn
i=1(yi¯y)2
ESS =Pn
i=1(yiˆyi)2
R2= 1 ESS
T SS
sb2=qs2
e
Pn
i=1(xi¯x)2
¯
R2= 1 n1
nk
ESS
T SS
F=nk
k1
R2
1R2
F=nk
kg
ESSrESSu
ESSu=nk
kg
R2
uR2
r
1R2
u
P r(Fkg,nk> F ) = F DI ST (F, k g, n k)
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Predicting API Scores with Class Size, Year Round Schools, and Parental Graduation Rate and more Lecture notes Economics in PDF only on Docsity!

Ecn 102 - Analysis of Economic Data University of California - Davis March 17, 2010 Instructor: John Parman

Final Exam - Solutions

You have until 12:30pm to complete this exam. Please remember to put your name, section and ID number on both your scantron sheet and the exam. Fill in test form A on the scantron sheet. Answer all multiple choice questions on your scantron sheet. Choose the single best answer for each multiple choice question. Answer the long answer questions directly on the exam. Keep your answers complete but concise. For the long answer questions, you must show your work where appropriate for full credit.

Name: ID Number: Section:

(POTENTIALLY) USEFUL FORMULAS

x¯ = (^1) n

∑n i=1 xi

s^2 = (^) n^1 − 1

∑n i=1(xi^ −^ x¯) 2

CV = (^) xs ¯

skew = (^) (n−1)(nn−2)

∑n i=1(^

xi−¯x s ) 3

kurt = (^) (n−1)(n(nn−+1)2)(n−3)

∑n i=1(^

xi−x¯ s ) (^4) − 3(n−1)^2 (n−2)(n−3)

μ = E(X)

z∗^ = x¯√− σμ n

t∗^ = x¯√− sμ n

t∗^ = bj s^ −bβj j

P r[Tn−k > tα,n−k] = α

P r[|Tn−k| > t α 2 ,n−k] = α

∑n i=1 a^ =^ na ∑n i=1(axi) =^ a^

∑n i=1 xi ∑n i=1(xi^ +^ yi) =^

∑n i=1 xi^ +^

∑n i=1 yi

s^2 = ¯x(1 − ¯x) for proportions data

tα,n−k = T IN V (2α, n − k)

P r(|Tn−k| ≥ |t∗|) = T DIST (|t∗|, n − k, 2)

P r(Tn−k > t∗) = T DIST (t∗, n − k, 1)

sxy = (^) n^1 − 1

∑n i=1(xi^ −^ x¯)(yi^ −^ y¯)

rxy =

∑n √∑n i=1(xi−x¯)(yi−y¯) i=1(xi−¯x)^2 ·

∑n i=1(yi−¯y)^2 rxy = √ssxxxy·syy

b 2 =

∑n i∑=1n(xi−x¯)(yi−¯y) i=1(xi−¯x)^2 = rxy

syy sxx

b 1 = ¯y − b 2 x¯

y ˆi = b 1 +

∑k j=2 bj^ xj,i

s^2 e = (^) n−^1 k

∑n i=1(yi^ −^ yˆi) 2

T SS =

∑n i=1(yi^ −^ y¯)

2

ESS =

∑n i=1(yi^ −^ yˆi) 2

R^2 = 1 − ESST SS

sb 2 =

∑n s^2 e i=1(xi−x¯)^2 R^ ¯^2 = 1 − n−^1 n−k

ESS T SS

F ∗^ = nk−−k 1 R 2 1 −R^2

F ∗^ = nk−−kgESS ESSr^ −ESSu u= nk−−kgR

(^2) u−R (^2) r 1 −R^2 u P r(Fk−g,n−k > F ∗) = F DIST (F ∗, k − g, n − k)

SECTION I: MULTIPLE CHOICE (60 points)

  1. Suppose that we regress Y on X. In which of the following scenarios would the estimated slope coefficient not be biased? (a) There is random measurement error in X. (b) There is random measurement error in Y. (c) There is an omitted variable Z that is correlated with both X and Y. (d) Both (b) and (c).

(b) Measurement error in X would bias the slope coefficient toward zero. Omitting the variable Z would bias the coefficient in a direction that depends on the signs of the correlations between Z and X and between Z and Y. Measurement error in Y would not bias the coefficient, it would just lead to a larger standard error for the coefficient.

  1. Which of the following would be the best graph for showing the distribution of heights in a sample of one thousand students? (a) Scatter plot. (b) Line chart. (c) Histogram. (d) Bubble chart.

(c) A histogram shows the distribution of a single variable.

  1. Suppose b˜j is an estimator for the slope coefficient βj. The expected value of b˜j is equal to βj plus ten. As the sample size gets larger and larger, the standard error of b˜j gets closer and closer to zero while the expected value of b˜j remains the same. Which of the following statements is true? (a) b˜j is an unbiased, consistent estimator of βj. (b) b˜j is a consistent but biased estimator of βj. (c) b˜j is an unbiased estimator fo βj but is not consistent. (d) b˜j is a biased estimator of βj and is not consistent.

(d) The estimator is biased because its expected value is not equal to βj. It is not consistent because as the sample size increases, even though the standard error of the estimator approaches zero the value of the estimator does not approach βj.

  1. Suppose that on average, an extra inch in height is associated with an extra five pounds in weight. We have a dataset containing height and weight information in which height is rounded to the nearest inch and weight is rounded to the nearest pound. If we regress weight on height, the expected value of the estimated height coefficient would be: (a) Equal to 5. (b) Equal to 15. (c) Less than 5. (d) Greater than 5.

score

hours

Use the figure above to answers questions 7 through 10. The figure is a scatter plot with hours of study on the horizontal axis and final exam score on the vertical axis for 21 students in an ECN 100 class. The round data points correspond to economics majors. The square data points correspond to nonmajors. Suppose we use this data to estimate the following model:

SCORE = β 1 + β 2 M AJOR + β 3 HOU RS + β 4 M AJOR · HOU RS + ε

where SCORE is a student’s final exam score, M AJOR is a dummy variable equal to one if the student is an economics major and zero otherwise, HOU RS is the number of hours the student studies for the final and ε is a random error term that satisfies all of our assumptions.

  1. Based on the scatterplot, we would expect our estimated value of β 2 to be:

(a) Positive. (b) Negative. (c) Larger for economics majors than nonmajors. (d) Larger for nonmajors than economics majors. (a) β 2 will be the difference in exam scores between economics majors and non- majors when hours of study are zero. On the graph, this is the difference between the vertical intercept for majors and the vertical intercept for nonmajors. From the graph, it is clear that the vertical intercept for majors is much larger than the vertical intercept for nonmajors, so we would expect β 2 to be positive.

  1. Based on the scatterplot, we would expect our estimated value of β 4 to be:

(a) Positive. (b) Negative. (c) Larger for economics majors than nonmajors. (d) Larger for nonmajors than economics majors. (b) β 4 is the difference between majors and nonmajors in the change in exam score from an extra hour of studying. On the graph, this is the difference in the slopes of the lines passing through the majors data points and the nonmajors data points.

From the graph, it is clear that slope is flatter for the majors than for the nonmajors. So we would expect β 4 to be negative.

  1. The predicted score for an economics major who studies ten hours will be: (a) Greater than the predicted score for a nonmajor who studies ten hours. (b) Less than the predicted score for a nonmajor who studies ten hours. (c) Equal to the predicted score for a nonmajor who studies ten hours. (d) Not enough information. (d) Notice that if you were to draw a line through the major data points and a line through the nonmajor data points, the lines would intersect at a positive number of hours of studying. To the left of this point, the predicted score for majors would be greater than the predicted score for nonmajors. To the right of this point, the predicted score for majors would be less than the predicted score for nonmajors. We would need to know whether ten hours is to the left or the right of this point to be able to answer the question.
  2. The predicted increase in score for an economics major associated with one extra hour of studying will be: (a) Equal to b 3 , where b 3 is our estimated value of β 3. (b) Equal to b 3 + b 4 , where b 3 and b 4 are our estimated values of β 3 and β 4 , respectively. (c) Equal to b 3 + b 4 · HOU RS, where b 3 and b 4 are our estimated values of β 3 and β 4 , respectively. (d) Equal to b 4 , where b 4 is our estimated value of β 4. (b) For every extra hour of studying, the score for an economics major goes up by b 3 (the component of the return to studying common to both majors and nonmajors) and by b 4 (the additional return to an hour of studying for an economics major).
  3. Which of the following would lead to a smaller standard error for the slope coefficient in a bivariate regression? (a) A smaller variance of the independent variable. (b) A smaller average size of the residuals. (c) A larger error sum of squares. (d) A smaller sample size. (b) A smaller average size of the residuals would mean a smaller standard error of the regression which also implies a smaller standard error for the slope coefficient (the standard error of the slope coefficient is proportional to the standard error of the regression).
  4. Which of the following would make us less likely to reject the null hypothesis that the true population mean is 150? (a) Getting a sample mean that is farther from 150. (b) Getting a larger t statistic. (c) Switching to a larger value for the significance level α. (d) None of the above.
  1. Which of the following is not an assumption we make when doing bivariate statistical infer- ence? (a) The expected value of the error term is equal to zero. (b) The errors are uncorrelated with the regressor. (c) The errors are uncorrelated with the dependent variable. (d) We assume all of the above. (c) The errors will always be correlated with the dependent variable (if the error term is larger by one unit, Y will be larger by one unit).
  2. Suppose we are testing whether high school GPA and college GPA are jointly significant in a regression with log wage as the dependent variable. Which of the following statements is true? (a) We will reject the null hypothesis that both coefficients are equal to zero at a 5% sig- nificance level if and only if both of the variables are individually significant at the 5% significance level. (b) We will reject the null hypothesis that both coefficients are equal to zero at a 5% signif- icance level if and only if at least one of the two variables is individually significant at the 5% significance level. (c) We will reject the null hypothesis that both coefficients are equal to zero at a 5% signif- icance level if one of the two variables is individually significant at the 5% significance level. (d) None of the above. (c) If we can show that one of the variables is significant at a 5% significance level, then we can clearly say that at least one of the two variables has a coefficient differ- ent than zero (in other words, the two variables are jointly significant). However, it is not necessary that at least one of the variables is individually significant for the variables to be jointly significant. Particularly in the case of highly correlated variables (such as high school and college GPA), it is possible that the two variables could be jointly significant while neither variable is individually significant.
  3. Suppose that we run a regression with the log of pounds of rice purchased from a store as the dependent variable and log of the price of a pound of rice as the independent variable. A slope coefficient of 0.3 would be interpreted as: (a) A one dollar increase in the price of a pound of rice is associated with a 0.3 pound increase in the amount of rice purchased. (b) A one percent increase in the price of a pound of rice is associated with a 30 percent increase in the amount of rice purchased. (c) A one percent increase in the price of a pound of rice is associated with a 0.3 percent increase in the amount of rice purchased. (d) A one dollar increase in the price of a pound of rice is associated with a 30 percent increase in the amount of rice purchased. (c) Since both variables are in logs, the slope coefficient can be interpreted as the percent change in the dependent variable with a one percent change in the independent variable.
  1. Suppose that the slope coefficient for a particular regressor Xj has a p-value of 0.02. We would conclude that the coefficient is: (a) Economically significant at a 5% significance level. (b) Economically significant at a 10% significance level. (c) Statistically significant at a 10% significance level. (d) All of the above.

(c) Since the p-value is smaller than 0.10, we would conclude that the coefficient is statistically significant at a 10% significance level. Without knowing the magnitude of the coefficient and what the coefficient is measuring, we have no way of saying whether the coefficient is economically significant.

  1. Adding an irrelevant variable to a regression will tend to lower:

(a) The R^2. (b) The adjusted R^2. (c) The standard errors of the coefficients for the other variables. (d) The total sum of squares.

(b) The R^2 will not decrease (it will likely stay the same since the new variable will not explain any of the variation in the dependent variable). The adjusted R^2 will tend to decrease since it takes into account not only the R^2 but also the number of variables used. The standard errors of the other coefficients will tend to increase when you add an irrelevant variable to the regression.

  1. If X and Y are perfectly correlated, when we regress Y on X we would get:

(a) A slope coefficient equal to one. (b) A slope coefficient equal to either one or negative one. (c) An R^2 equal to one. (d) An intercept equal to zero.

(c) Knowing that X and Y are perfectly correlated tells us the the data points will all lie along a straight line, giving us an R^2 value of one. However, it does not tell us what the slope of that line is or whether it has a nonzero intercept.

  1. The distribution of the sample mean of X will:

(a) Be centered at zero. (b) Be centered at the population mean of X. (c) Have a variance equal the variance of X. (d) Both (b) and (c).

(b) The distribution of the sample mean will always be centered at the true popu- lation mean of X. The variance of the distribution of the sample mean will depend on both the variance of X and on the sample size.

  1. Suppose that the true relationship between Y and X is given by:

Y = β 1 + β 2 X + ε

SECTION II: SHORT ANSWER (40 points)

  1. (18 points) For each scenario below, a researcher is attempting to estimate a slope coefficient of particular interest. Explain whether the researcher will get an unbiased estimate of the slope coefficient and, if not, what direction the bias will be in. Note that there may be multiple correct answers for each question. If you can properly justify your answer, you will receive full credit. (a) A researcher wants to estimate the effect of winter on happiness for the population of the United States. To do this, a researcher asks 1,000 people from Southern California how happy they are on a scale of 0 to 100 (with 100 being the happiest). The researcher creates a dummy variable that equals one if a person was asked this question in the winter months and equal to zero otherwise. To estimate the effect of winter on happiness, the researcher regresses the happiness number on this dummy variable for winter. There is a potential problem with sample selection bias in this scenario. The researcher wants to study the relationship between winter and happiness for the entire US population. The relationship between winter and happiness for Southern Californians is likely very different than the relationship for people in other parts of the country, particularly parts of the country that have more severe winter weather. If you think that the cold and snow associated with winter decreases people’s happiness, you would expect a negative coefficient on the winter dummy and you expect that the magnitude of the effect will be larger for people from colder parts of the country. So using a sample of Southern Californians who experience mild winters will lead to an underestimate of the effect of winter on happiness. In this case of a negative true coefficient, underestimating the coefficient implies an upward bias. (b) A high school wants to know if assigning more reading leads to higher test scores. To determine this, the school takes a random sample of one hundred students and regresses their test scores on the number of pages they were assigned to read. Teachers who assign more reading also tend to spend more time preparing their lectures and answering their students’ questions. There is an omitted variable bias resulting from not controlling for teacher quality. The coefficient on reading will pick up the direct effect of extra reading on test scores but also the indirect effect of more reading being associated with better teachers and better teachers being associated with higher test scores. Since the sign of the correlation between assigned reading and teacher quality is positive and the sign of the correlation between teacher quality and test scores is likely to be positive, the overall sign of the bias will be positive. So we have an upward bias on the assigned reading coefficient. (c) A student wants to know the effect of hours of exercise per week on her weekly quiz scores. To do this, the student looked up all of her quiz scores for the past year and tried to remember how many hours she exercised each week for the past year. She knows the exact quiz scores but can only remember roughly (not exactly) how many hours she exercised each week. To determine the effect of exercise on quiz performance, she regresses quiz score on hours of exercise per week. She uses 52 data points for this regression (one for every week in the past year).

Here the problem is the measurement of hours of exercise. Since the student does not remember the exact value for each week, this variable will be measured with some random error. Random measurement error in the independent variable will bias the coefficient on that variable toward zero. If the effect of exercise on quiz score is positive, this would mean there is a downward bias. If the effect of exercise on quiz score is negative, it would mean there is an upward bias.

β 1 is the electricity used by a household with no income (I = 0) at a temperature of zero degrees Celcius (Tc = 0). This certainly will not be a negative number (the household cannot use negative amounts of electricity). Most likely, it will be a positive number since even if a family doesn’t have a current source of income they will likely still need to use some electricity.

The signs of β 2 and β 3 need to give us the U-shape originally described in the model. Since the U-shape faces upward β 3 must be positive. Another way to see this is that the slope of the curve would be equal to β 2 + 2β 3 Tc and the slope gets more positive as Tc gets larger, so β 3 must be positive.

Note that the minimum of this curve will likely occur at a temperature well above zero degrees since it starts rising again because of air conditioner usage. The minimum of the curve is where the slope is zero, implying that β 2 = − 2 β 3 Tc. Since the minimum occurs at a positive Tc and we just determined that β 3 is positive, this implies that β 2 should be negative.

Finally, we are told in part (b) that higher income families use more electricity, so β 4 should be positive.

SUMMARY OUTPUT: API score as dependent variable

Regression Statistics Multiple R 0. R Square 0. Adjusted R Square 0. Standard Error 76. Observations 6426

Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 688.49 5.93 116.09 0 676.86 700. CLASSSIZE 3.91 0.21 18.77 1.41E‐ 76 3.50 4. YEARROUND ‐21.82 4.90 ‐4.46 8.5E‐ 06 ‐31.43 ‐12. NONHSGRAD ‐88.76 3.56 ‐24.94 4.1E‐ 131 ‐95.73 ‐81.

SUMMARY OUTPUT: API score as dependent variable

Regression Statistics Multiple R 0. R Square 0. Adjusted R Square 0. Standard Error 80. Observations 6426

Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 671.78 6.19 108.53 0 659.65 683. CLASSSIZE 4.22 0.22 19.33 5.42E‐ 81 3.80 4.

SUMMARY OUTPUT: API score as dependent variable

Regression Statistics Multiple R 0. R Square 0. Adjusted R Square 0. Standard Error 79. Observations 6426

Coefficients Standard Error t Stat P‐value Lower 95% Upper 95% Intercept 798.17 1.04 765.51 0 796.13 800. YEARROUND ‐23.39 5.03 ‐4.65 3.38E‐ 06 ‐33.25 ‐13. NONHSGRAD ‐92.40 3.65 ‐25.32 7.2E‐ 135 ‐99.55 ‐85.

Variable Mean Minimum Maximum (^) DeviationStandard API 789.854 310 998 83. CLASSSIZE 27.949^1 50 4. YEARROUND 0.040 0 1 0. NONHSGRAD 0.080 0 1 0.

Summary Statistics for the Regression Sample

API scores is:

AP Î A − AP Î B = 783. 97 − 677 .93 = 106. 04

So the predicted API score in school district A is 106.04 points higher than the predicted API score in district B.

(b) Suppose you wanted to use an F test to determine whether the Y EARROU N D and N ON HSGRAD variables are jointly significant in the regression that includes CLASSSIZE, Y EARROU N D and N ON HSGRAD. In other words, you want to test the following set of hypotheses: H 0 : βY EARROU N D = βN ON HSGRAD = 0 Ha: at least one of βY EARROU N D andβHSGRAD is not equal to zero Calculate the F statistic you would use to do this test. You should calculate an exact numerical value for the F statistic. We are testing the joint significance of Y EARROU N D and N ON HSGRAD in the regression that contains all three variables. So our unrestricted model is:

AP I = β 1 + β 2 CLASSSIZE + β 3 Y EARROU N D + β 4 N ON HSGRAD + ε

Notice that there are four variables in this equation, so k is equal to four. Our restricted model excludes the variables that we are testing the joint significance of: AP I = β 1 + β 2 CLASSSIZE + ε Notice that there are only two variables left in the equation, so g is equal to two. The regression results for the unrestricted model are given in the first set of results on the previous page. The regression results for the restricted model are given in the second set of results. We do not need the third set of regression results for this particular question. Given the regression results, the calculation of the test statistic is straightforward:

F ∗^ =

n − k k − g

R^2 u − R^2 r 1 − R^2 u

F ∗^ =

F ∗^ = 333. 85