



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The results of a regression analysis for various models with different independent variables (x1, x2, x3, x4, x5) to predict y. The analysis includes stem-and-leaf plots, regression equations, analysis of variance tables, and f-tests to determine the significance of each variable and compare different models. The document also discusses the importance of looking at data graphs before analysis and the use of confidence intervals.
Typology: Exercises
1 / 7
This page cannot be seen from the preview
Don't miss anything!




Question #1:
(a) The least-square estimate of the regression line when Y regressed on X 1 is:
Yˆ = − 70 .42020 + 227. 09370 X 1
Based on the computer output on pages 1 and 2, we have R^2 = 0.9194 and rY X 1 = 0.95884. Therefore, r^2 Y X 1 = (0.95884)^2 = 0.9197 = R^2. Yes, they are equal, and it should be since in the simple linear regression model R^2 = R^2 Y X 1.
(b) The least-square estimate of the regression line when Y regressed on X 1 and X 2 is: Yˆ = − 8 .08481 + 68. 25068 X 1 + 2. 29387 X 2
Based on the computer output on pages 3 and 4, we have R^2 = 0.9664, rY X 1 = 0 .95884, and rY X 2 = 0.97907. Therefore, r^2 Y X 1 = (0.95884)^2 = 0. 9197 6 = R^2 , and r^2 Y X 2 = (0.97907)^2 = 0. 9586 6 = R^2. No, they are not equal, and it should not be since in the multiple linear regression model R^2 6 = R^2 Y X 1 and R^2 6 = R^2 Y X 2. We can use a F -test to compare the following two models:
Full Model : Y = β 0 + β 1 X 1 + β 2 X 2 + E
Reduced Model : Y = β 0 + β 1 X 1 + E
The F -test (or the test statistic) is
[SSR(Full) − SSR(Reduced)] / 1 M SRes(Full)
Since T S = 11. 22 > F 0. 05 , 1 , 8 = 5.32, we reject H 0 : β 2 = 0. Therefore, adding X 2 in the model is useful to predict Y. We can also use a t test for testing H 0 : β 2 = 0. Based on the computer output on page 3, the t-test for testing H 0 : β 2 = 0, is T S = 3.35, and p − value = 0. 0101 < α = 0.05, therefore, we reject H 0. (Note that (3.35)^2 = 11.22, the F -value above.)
(c) The least-square estimate of the regression line when Y regressed on X 1 , X 2 and X 3 and its R^2 are:
Yˆ = − 1 .87932 + 77. 32578 X 1 + 1. 55910 X 2 − 23. 90378 X 3 R^2 = 0. 9769
Based on part (a) − (c), we have
Number Model R^2 Adjusted R^2 M SRes 1 X 1 0.9194 0.9104 6. 2 X 1 , X 2 0.9664 0.9580 2. 3 X 1 , X 2 , X 3 0.9769 0.9670 2.
If we only use R^2 , we choose model number 1. (If we look at M SRes, we choose model 2. Note that
R^2 Adj = 1 −
n − 1 SST
M SRes.
Therefore, the criteria minimum M SRes and maximum adjusted R^2 are equiv- alent.)
(d) Based on the computer output on page 6, we have
Dependent Predicted Residuals Plot Variable (y) Value (ˆy) (e = y − ˆy) 1 27.1000 26.2837 0. 5 40.2000 40.3619 -0. 7 55.5000 53.3486 2.
Note that the predicted response values ˆy can also provide a measure of model validity. Unrealistic predicted values such as negative predictions of a positive quantity or predictions that fall outside the actual range of the response, indicate poorly estimated coefficients or an incorrect model form. In this case, there is no unusual predicted value.
(e)
Source df Sum of squares Mean of squares F -value p-value Model 3 680.49122 226. 98.66 < 0. 0001 Error 7 16.09423 2.
Total 10 696.
p − value < 0 .0001, therefore, we reject H 0 : β 1 = β 2 = β 3 = 0. This means that at least one of the variables X 1 , X 2 , and X 3 is useful to predict Y. (We might need other variables too.)
Since the p-value for Temperature and Particle Size is less than α = 0.05, these two variables are useful to predict Y. Other variables may not be useful to predict Y , since their p-value is greater than α = 0.05. We might want to delete them from the model.
(e) Let
Full Model : Y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 + E
Reduced Model : Y = β 0 + β 2 X 2 + β 5 X 5 + E
Then
Model R^2 R^2 Adj M SRes Full 0.9372 0.9058 65. Reduced 0.9149 0.9018 67.
By comparing R^2 and R^2 Adj, we conclude that adding variables X 1 , X 3 , X 4 , doest not increase R^2 and R^2 Adj significantly. Therefore, we should not include those variables in the mode. To compare the two model, the test statistic is:
[SSR(Full) − SSR(Reduced)] / 3 M SRes(Full)
Since T S = 1. 136 < F 0. 05 , 3 , 13 = 3.41, we fail to reject H 0 : β 1 = β 3 = β 4 = 0. Therefore, adding X 1 , X 3 , and X 2 in the model is not useful to predict Y. Note that we cannot use a t-test for testing the null hypothesis.
(f ) Using the computer output on pages 22 and 23, we have that a 95% con- fidence interval for β 2 is (0. 15378 , 0 .41051) for full model and a 95% for β 2 is (0. 15506 , 0 .40923). These two confidence interval are very similar, and therefore, deleting variables X 1 , X 3 and X 4 does not change our result for Pressure.
(g) A 95% confidence interval for μY given x 0 = (1, 500 , 95 , 15 , 40 , 4) ′ is
yˆ 0 ± t α 2
ˆσ^2
x ′ 0 (X
′ (^) X)− (^1) x 0
where
ˆy 0 = 52 .07905 + 0.05556 (500) + 0.28214 (95)
t 0. 025 ,df =10 = 2.228 and x
′ 0 (X
′ X)−^1 x 0 = 0.3122698. Therefore, a 95% confi- dence interval for μY given x 0 is
By using R, a 95% confidence interval when X 1 = 500, X 2 = 95, X 3 = 15, X 4 = 40, and X 5 = 4 is (34. 23322 , 54 .31772).
(h) A 95% confidence interval for μY given x 0 = (1, 95 , 4)
′ is
yˆ 0 ± t α 2
ˆσ^2
x′ 0 (X′^ X)−^1 x 0
where yˆ 0 = 80.13461 + + 0.28214 (95) − 16 .06498(4) = 42. 678
t 0. 025 ,df =13 = 2.160 and x ′ 0 (X
′ X)−^1 x 0 = 0.1830688. Therefore, a 95% confi- dence interval for μY given x 0 is
By using R, a 95% confidence interval when X 2 = 95, and X 5 = 4 is (35. 06559 , 50 .2909). This confidence interval is shorter than the confidence interval in part (g). Therefore, the model that include X 2 and X 5 gives a shorter confidence in- terval and it is better.
(i) The least-square estimate of the regression line when Y regressed on X 5 is:
Yˆ = 97. 06318 − 16. 06498 X 5
Let Full Model : Y = β 0 + β 2 X 2 + β 5 X 5 + E Reduced Model : Y = β 0 + β 5 X 5 + E
Therefore, the F -test (or the test statistic) for testing the contribution of tem- perature to the model is
[SSR(Full) − SSR(Reduced)] / 1 M SRes(Full)
Since T S = 23. 04 > F 0. 05 , 1 , 13 = 4.67, we reject H 0 : β 2 = 0. Therefore, adding X 2 in the model is useful to predict Y. We can also use a t test for testing H 0 : β 2 = 0. Based on the computer output on page 23, the t-test for testing H 0 : β 2 = 0, is T S = 0. 28214 / 0 .05883 = 4.79585 (or 4.80), and p − value =
Let
Full Model : β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 12 + β 5 X 22 + β 6 X 1 X 2 + E
Reduced Model : Y = β 0 + β 1 X 1 + β 2 X 2 + β 6 X 1 X 2 + E
Therefore, the F -test (or the test statistic) for testing H 0 : β 4 = 0 is
[SSR(Full) − SSR(Reduced)] / 1 M SRes(Full)
Since T S = 5. 84 > F 0. 05 , 3 , 17 = 3.20, we reject H 0 : β 3 = β 4 = β 5 = 0. Therefore, adding X 3 , X 12 , and X 22 in the model does lead to large improvement in the model with X 1 , X 2 , and X 1 X 2 as independent variable. Then, the full model is superior. On the other hand,
Model R^2 R^2 Adj M SRes Full 0.9191 0.8905 3. Reduced 0.8357 0.8110 5.
R^2 and RAdj^2 did not change significantly. Based on the above results, we may
say that the reduced model is superior. Note that R^2 is not always a good criteria for selecting a model. Since 3. 28058 / 5 .66290 = 0.63 , there is 63% deduction in M SRes. So we should choose the full model. Therefore, the full model is superior.