Download Review Problems for Midterm Exam #1 - Applied Regression Analysis | STAT 51200 and more Exams Statistics in PDF only on Docsity!
Statistics 512: Review Problems for First Midterm Exam
Keep for First Exam Review (October 6)
1. Short answer questions. Unless stated otherwise, each part is unrelated.
(a) A polynomial regression model y = β 0 + β 1 X + β 2 X^2 + fit to a set of data gives
b 0 = 2, b 1 = 4, and b 2 = 3. Find the predicted value of the response variable when the
explanatory variable is equal to 4.
(b) The M SE for a multiple regression is 40. What is the estimate of the standard deviation
of the error term in the model?
(c) For a simple linear regression, the estimate of the slope is 7 with a standard error of 2.
Give an estimate of the change in the response variable that you would expect if the
explanatory variable increased by 4.
(d) Refer to the previous problem. Assume that the sample size is 36. Give a 95% confidence
interval for your estimate.
(e) A multiple regression is run with 40 cases and eight explanatory variables. Give the
degrees of freedom for the F -statistic that tests the null hypothesis that the coefficients
for the first three explanatory variables are equal to zero.
(f) In a simple linear regression here are two intervals associated with X = 5: (20, 30) and
(15, 35). Which is the prediction interval and which is the confidence interval for the
mean response? Explain your answer.
(g) The correlation between two variables U and V is -0.5. What percent of the variation
in U can be explained by V using a simple linear regression?
(h) There are numerous ways to estimate a regression line. Describe the method of least
squares (with a picture and/or words) and explain the effect of this approach on the
sum of squares error, SSE.
(i) Suppose the estimated regression equation is Yˆ = 3 + 5X. Give the estimated regression
function if the variable U = (X − 4)/10 were used in place of X.
(j) Explain how a 98% confidence interval for the slope can be used to test H 0 : β 1 = 5 and
at what significance level α?
(k) Rob Poorman Auto Sales has decided to use R^2 to select the best model in predicting
car demand. Explain when this is and when this is not a reasonable approach.
2. Refer to the SAS output on the last pages (marked OUTPUT FOR PROBLEM 2). The
data are from a study of 78 seventh grade students. The goal is to predict GRADE (average
school grade on a scale of 0 to 11) from variables which include IQ (score on an I.Q. test)
and GENDER (0 = female, 1 = male).
(a) Using the output for the simple linear regression, does there appear to be a linear
relationship between GRADE and IQ? Give a test statistic with degrees of freedom and
p-value to support your answer (you may use other evidence as well).
(b) Individual 51 has GRADE = 0.53 and IQ = 103. What value of GRADE is predicted
for this individual by the estimated simple linear regression model? The studentized
residual (residual divided by its standard error) for this individual is equal to -3..
On that basis, do you consider this observation to be an outlier? Explain.
(c) The variable IQGEN is the product of IQ and GENDER. Examine the output for the
model involving these three variables. Write down the estimated regression equation for
this model. Also write down the two separate fitted lines for female and male students.
(d) Examine the results of the t-tests for the three regression coefficients as well as the result
of the (general linear) F -test labeled “SAMELINE”. The results of this general linear
test were produced with the SAS input line “test gender, iqgen;”.
State the null hypotheses tested by each of these four tests and whether that hypoth-
esis is rejected. What apparent conflict do you see between the results of these tests?
Explain why such a conflict might arise and suggest one possible action that might be
used to eliminate this conflict.
OUTPUT FOR PROBLEM 2
The REG Procedure Model: MODEL Dependent Variable: grade
Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F
Model 1 136.31881 136.31881 51.01 <. Error 76 203.10809 2. Corrected Total 77 339.
Root MSE 1.63477 R-Square 0. Dependent Mean 7.44654 Adj R-Sq 0. Coeff Var 21.
Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| 95% Confidence Limits
Intercept 1 -3.55706 1.55176 -2.29 0.0247 -6.64766 -0. iq 1 0.10102 0.01414 7.14 <.0001 0.07285 0.
The REG Procedure Model: MODEL Dependent Variable: grade
Analysis of Variance
Sum of Mean Source DF Squares Square F Value Pr > F
Model 3 155.42484 51.80828 20.84 <. Error 74 184.00205 2. Corrected Total 77 339.
Root MSE 1.57687 R-Square 0. Dependent Mean 7.44654 Adj R-Sq 0. Coeff Var 21.
Parameter Estimates
Parameter Standard Variable DF Estimate Error t Value Pr > |t|
Intercept 1 -2.25235 2.15377 -1.05 0. iq 1 0.09400 0.02017 4.66 <. gender 1 -3.84266 3.03670 -1.27 0. iqgen 1 0.02656 0.02784 0.95 0.
Test sameline Results for Dependent Variable grade
Mean Source DF Square F Value Pr > F
Numerator 2 9.55302 3.84 0. Denominator 74 2.