Regression Analysis: Comparing Simple and Multiple Linear Regression Models, Exercises of Mathematical Statistics

The results of a regression analysis for various models with different independent variables (x1, x2, x3, x4, x5) to predict y. The analysis includes stem-and-leaf plots, regression equations, analysis of variance tables, and f-tests to determine the significance of each variable and compare different models. The document also discusses the importance of looking at data graphs before analysis and the use of confidence intervals.

Typology: Exercises

2012/2013

Uploaded on 01/11/2013

m-alam
m-alam 🇮🇳

4.7

(12)

54 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Question #1:
(a) The least-square estimate of the regression line when Yregressed on X1is:
ˆ
Y=70.42020 + 227.09370 X1
Based on the computer output on pages 1 and 2, we have R2= 0.9194 and
rY X1= 0.95884. Therefore, r2
Y X1= (0.95884)2= 0.9197 = R2. Yes, they are
equal, and it should be since in the simple linear regression model R2=R2
Y X1.
(b) The least-square estimate of the regression line when Yregressed on X1and
X2is: ˆ
Y=8.08481 + 68.25068 X1+ 2.29387 X2
Based on the computer output on pages 3 and 4, we have R2= 0.9664, rYX1=
0.95884, and rY X2= 0.97907. Therefore, r2
Y X1= (0.95884)2= 0.9197 6=R2,
and r2
Y X2= (0.97907)2= 0.9586 6=R2. No, they are not equal, and it should
not be since in the multiple linear regression model R26=R2
Y X1and R26=R2
Y X2.
We can use a F-test to compare the following two models:
Full Model : Y=β0+β1X1+β2X2+E
Reduced Model : Y=β0+β1X1+E
The F-test (or the test statistic) is
T S =[SSR(Full) SSR(Reduced)] /1
MSRes(Full) =[673.20680 640.42489] /1
23.37865/8= 11.22
Since T S = 11.22 > F0.05,1,8= 5.32, we reject H0:β2= 0. Therefore, adding
X2in the model is useful to predict Y. We can also use a ttest for testing
H0:β2= 0. Based on the computer output on page 3, the t-test for testing
H0:β2= 0, is T S = 3.35, and pvalue = 0.0101 < α = 0.05, therefore, we
reject H0. (Note that (3.35)2= 11.22, the F-value above.)
(c) The least-square estimate of the regression line when Yregressed on X1,X2
and X3and its R2are:
ˆ
Y=1.87932 + 77.32578 X1+ 1.55910 X223.90378X3R2= 0.9769
Based on part (a)(c), we have
1
Docsity.com
pf3
pf4
pf5

Partial preview of the text

Download Regression Analysis: Comparing Simple and Multiple Linear Regression Models and more Exercises Mathematical Statistics in PDF only on Docsity!

Question #1:

(a) The least-square estimate of the regression line when Y regressed on X 1 is:

Yˆ = − 70 .42020 + 227. 09370 X 1

Based on the computer output on pages 1 and 2, we have R^2 = 0.9194 and rY X 1 = 0.95884. Therefore, r^2 Y X 1 = (0.95884)^2 = 0.9197 = R^2. Yes, they are equal, and it should be since in the simple linear regression model R^2 = R^2 Y X 1.

(b) The least-square estimate of the regression line when Y regressed on X 1 and X 2 is: Yˆ = − 8 .08481 + 68. 25068 X 1 + 2. 29387 X 2

Based on the computer output on pages 3 and 4, we have R^2 = 0.9664, rY X 1 = 0 .95884, and rY X 2 = 0.97907. Therefore, r^2 Y X 1 = (0.95884)^2 = 0. 9197 6 = R^2 , and r^2 Y X 2 = (0.97907)^2 = 0. 9586 6 = R^2. No, they are not equal, and it should not be since in the multiple linear regression model R^2 6 = R^2 Y X 1 and R^2 6 = R^2 Y X 2. We can use a F -test to compare the following two models:

Full Model : Y = β 0 + β 1 X 1 + β 2 X 2 + E

Reduced Model : Y = β 0 + β 1 X 1 + E

The F -test (or the test statistic) is

T S =

[SSR(Full) − SSR(Reduced)] / 1 M SRes(Full)

[673. 20680 − 640 .42489] / 1

Since T S = 11. 22 > F 0. 05 , 1 , 8 = 5.32, we reject H 0 : β 2 = 0. Therefore, adding X 2 in the model is useful to predict Y. We can also use a t test for testing H 0 : β 2 = 0. Based on the computer output on page 3, the t-test for testing H 0 : β 2 = 0, is T S = 3.35, and p − value = 0. 0101 < α = 0.05, therefore, we reject H 0. (Note that (3.35)^2 = 11.22, the F -value above.)

(c) The least-square estimate of the regression line when Y regressed on X 1 , X 2 and X 3 and its R^2 are:

Yˆ = − 1 .87932 + 77. 32578 X 1 + 1. 55910 X 2 − 23. 90378 X 3 R^2 = 0. 9769

Based on part (a) − (c), we have

Number Model R^2 Adjusted R^2 M SRes 1 X 1 0.9194 0.9104 6. 2 X 1 , X 2 0.9664 0.9580 2. 3 X 1 , X 2 , X 3 0.9769 0.9670 2.

If we only use R^2 , we choose model number 1. (If we look at M SRes, we choose model 2. Note that

R^2 Adj = 1 −

n − 1 SST

M SRes.

Therefore, the criteria minimum M SRes and maximum adjusted R^2 are equiv- alent.)

(d) Based on the computer output on page 6, we have

Dependent Predicted Residuals Plot Variable (y) Value (ˆy) (e = y − ˆy) 1 27.1000 26.2837 0. 5 40.2000 40.3619 -0. 7 55.5000 53.3486 2.

Note that the predicted response values ˆy can also provide a measure of model validity. Unrealistic predicted values such as negative predictions of a positive quantity or predictions that fall outside the actual range of the response, indicate poorly estimated coefficients or an incorrect model form. In this case, there is no unusual predicted value.

(e)

Source df Sum of squares Mean of squares F -value p-value Model 3 680.49122 226. 98.66 < 0. 0001 Error 7 16.09423 2.

Total 10 696.

p − value < 0 .0001, therefore, we reject H 0 : β 1 = β 2 = β 3 = 0. This means that at least one of the variables X 1 , X 2 , and X 3 is useful to predict Y. (We might need other variables too.)

Since the p-value for Temperature and Particle Size is less than α = 0.05, these two variables are useful to predict Y. Other variables may not be useful to predict Y , since their p-value is greater than α = 0.05. We might want to delete them from the model.

(e) Let

Full Model : Y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 4 + β 5 X 5 + E

Reduced Model : Y = β 0 + β 2 X 2 + β 5 X 5 + E

Then

Model R^2 R^2 Adj M SRes Full 0.9372 0.9058 65. Reduced 0.9149 0.9018 67.

By comparing R^2 and R^2 Adj, we conclude that adding variables X 1 , X 3 , X 4 , doest not increase R^2 and R^2 Adj significantly. Therefore, we should not include those variables in the mode. To compare the two model, the test statistic is:

T S =

[SSR(Full) − SSR(Reduced)] / 3 M SRes(Full)

[9712. 50 − 9481 .25] / 3

Since T S = 1. 136 < F 0. 05 , 3 , 13 = 3.41, we fail to reject H 0 : β 1 = β 3 = β 4 = 0. Therefore, adding X 1 , X 3 , and X 2 in the model is not useful to predict Y. Note that we cannot use a t-test for testing the null hypothesis.

(f ) Using the computer output on pages 22 and 23, we have that a 95% con- fidence interval for β 2 is (0. 15378 , 0 .41051) for full model and a 95% for β 2 is (0. 15506 , 0 .40923). These two confidence interval are very similar, and therefore, deleting variables X 1 , X 3 and X 4 does not change our result for Pressure.

(g) A 95% confidence interval for μY given x 0 = (1, 500 , 95 , 15 , 40 , 4) ′ is

yˆ 0 ± t α 2

ˆσ^2

x ′ 0 (X

′ (^) X)− (^1) x 0

where

ˆy 0 = 52 .07905 + 0.05556 (500) + 0.28214 (95)

  • 0.12500 (15) + 4.56594(10−^15 ) (40) − 16 .06498(4) = 44. 27

t 0. 025 ,df =10 = 2.228 and x

′ 0 (X

′ X)−^1 x 0 = 0.3122698. Therefore, a 95% confi- dence interval for μY given x 0 is

  1. 27 ± 2. 228

By using R, a 95% confidence interval when X 1 = 500, X 2 = 95, X 3 = 15, X 4 = 40, and X 5 = 4 is (34. 23322 , 54 .31772).

(h) A 95% confidence interval for μY given x 0 = (1, 95 , 4)

′ is

yˆ 0 ± t α 2

ˆσ^2

x′ 0 (X′^ X)−^1 x 0

where yˆ 0 = 80.13461 + + 0.28214 (95) − 16 .06498(4) = 42. 678

t 0. 025 ,df =13 = 2.160 and x ′ 0 (X

′ X)−^1 x 0 = 0.1830688. Therefore, a 95% confi- dence interval for μY given x 0 is

  1. 678 ± 2. 160

By using R, a 95% confidence interval when X 2 = 95, and X 5 = 4 is (35. 06559 , 50 .2909). This confidence interval is shorter than the confidence interval in part (g). Therefore, the model that include X 2 and X 5 gives a shorter confidence in- terval and it is better.

(i) The least-square estimate of the regression line when Y regressed on X 5 is:

Yˆ = 97. 06318 − 16. 06498 X 5

Let Full Model : Y = β 0 + β 2 X 2 + β 5 X 5 + E Reduced Model : Y = β 0 + β 5 X 5 + E

Therefore, the F -test (or the test statistic) for testing the contribution of tem- perature to the model is

T S =

[SSR(Full) − SSR(Reduced)] / 1 M SRes(Full)

[9481. 25 − 7921 .00] / 1

Since T S = 23. 04 > F 0. 05 , 1 , 13 = 4.67, we reject H 0 : β 2 = 0. Therefore, adding X 2 in the model is useful to predict Y. We can also use a t test for testing H 0 : β 2 = 0. Based on the computer output on page 23, the t-test for testing H 0 : β 2 = 0, is T S = 0. 28214 / 0 .05883 = 4.79585 (or 4.80), and p − value =

  1. 0003 < α = 0.05, therefore, we reject H 0. Note that (4.79585)^2 = 23.00, the F -value above.)

Let

Full Model : β 0 + β 1 X 1 + β 2 X 2 + β 3 X 3 + β 4 X 12 + β 5 X 22 + β 6 X 1 X 2 + E

Reduced Model : Y = β 0 + β 1 X 1 + β 2 X 2 + β 6 X 1 X 2 + E

Therefore, the F -test (or the test statistic) for testing H 0 : β 4 = 0 is

T S =

[SSR(Full) − SSR(Reduced)] / 1 M SRes(Full)

[633. 4901190 − 576 .0019463] / 3

Since T S = 5. 84 > F 0. 05 , 3 , 17 = 3.20, we reject H 0 : β 3 = β 4 = β 5 = 0. Therefore, adding X 3 , X 12 , and X 22 in the model does lead to large improvement in the model with X 1 , X 2 , and X 1 X 2 as independent variable. Then, the full model is superior. On the other hand,

Model R^2 R^2 Adj M SRes Full 0.9191 0.8905 3. Reduced 0.8357 0.8110 5.

R^2 and RAdj^2 did not change significantly. Based on the above results, we may

say that the reduced model is superior. Note that R^2 is not always a good criteria for selecting a model. Since 3. 28058 / 5 .66290 = 0.63 , there is 63% deduction in M SRes. So we should choose the full model. Therefore, the full model is superior.