Download Homework Solution for Chapter 6 - Applied Regression Analysis | STAT 645 and more Assignments Statistics in PDF only on Docsity! STAT 645 HOMEWORK SOLUTION FOR CHPATER 6 SPRING 2006 Problem 6.1 (a) X= ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ X41X42 X41 1 X31X32 X31 1 X21X22 X21 1 X11X12 X11 1 β= ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ 2 1 0 β β β (b) X= , ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ X42 X41 1 X32 X31 1 X22 X21 1 X12 X11 1 β= ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ 2 1 0 β β β Problem 6.5 (a) The correlation matrix is Y X1 X2 ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ 1.00000 0.00000 0.39458 0.00000 1.00000 0.89239 0.39458 0.89239 1.00000 Y 100 80 60 10.07.55.0 X1 10.0 7.5 5.0 1008060 4 3 2 X2 432 Matrix Plot of Y, X1, X2 (b) Regression Analysis: Y versus X1, X2 The regression equation is Y = 37.7 + 4.42 X1 + 4.38 X2 Predictor Coef SE Coef T P Constant 37.650 2.996 12.57 0.000 X1 4.4250 0.3011 14.70 0.000 X2 4.3750 0.6733 6.50 0.000 S = 2.69330 R-Sq = 95.2% R-Sq(adj) = 94.5% b1=4.4250 means the change in the mean response per unit increase in X1 (moisture content) while X2 (sweetness) is held constant. (c) R ES I1 5 4 3 2 1 0 -1 -2 -3 -4 Boxplot of RESI1 This box plot shows that the residuals are symmetrically distributed around 0 and are located between -2 and 2. (d) Following are the residual plots against various variables. Residual Pe rc en t 5.02.50.0-2.5-5.0-7.5 99 95 90 80 70 60 50 40 30 20 10 5 1 Normal Probability Plot of the Residuals (response is Y) The above normal probability plot shows that the residuals has a normal distribution. (e) H0:γ 1=γ 2=0 vs Ha: at least one of γ 1 and γ 2 is not 0. SSE=94.3, SSR*=72.41, so XBP^2=(SSR*/2)/(SSE/n)^2=1.04< (0.99, 2)=9.21 2χ so we conclude H0, i.e. the error variance is constant. (f) H0: E{Y}=β0+β1X1+β2X2 Ha: E{Y} ≠ β0+β1X1+β2X2 From MINITAB, we have Analysis of Variance Source DF SS MS F P Regression 2 1872.70 936.35 129.08 0.000 Residual Error 13 94.30 7.25 Lack of Fit 5 37.30 7.46 1.05 0.453 Pure Error 8 57.00 7.13 Total 15 1967.00 c=4*2=8, p=3, F*=1.05<F(0.99;5,8)=6.63, so we conclude H0. Problem 6.6 (a) H0: 1β = 2β =0 against H1: not both 1β and 2β equal 0. From MINITAB we obtain Analysis of Variance Source DF SS MS F P Regression 2 1872.70 936.35 129.08 0.000 Residual Error 13 94.30 7.25 Total 15 1967.00 Note F*=936.35/7.25=129.083> F(0.99;2,13)=6.70, so we conclude H1, which implies at least one of 1β and 2β is not 0. (b) P-value is 0.000+. (c) B=t(1-0.01/4;13)= 3.372, s{b1}=0.3011, s{b2}=0.6733 So for 1β the 99% CI is (4.425- 3.372*0.3011, 4.425+ 3.372*0.3011)=(3.4097, 5.4403) For 2β , the 99% CI is (4.375- 3.372*0.6733, 4.375+ 3.372*0.6733)=(2.1046, 6.6454) Problem 6.7 (a) R^2=1872.20/1967.00=0.9518, which measures the proportionate reduction of total variation in Y associated with the use the set of X variables X1 and X2. (b) Yes, they are the same. Problem 6.8 (You can get the results directly through MINITAB) (a) We define Xh=[1, 5, 4]’, Yh^=77.275, also note t(0.995;13)=3.012 (X’X)^-1= , ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ 0.0625 0.0000 0.1875- 0.0000 0.0125 0.0875- 0.1875- 0.0875- 1.2375 so s^2{Yh^}=7.25*[1, 5, 4] [1, 5, 4]’=1.2688, so s{ Y ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ 0.0625 0.0000 0.1875- 0.0000 0.0125 0.0875- 0.1875- 0.0875- 1.2375 h^}=1.1264 so the 99% CI for E{Yh} is (77.275-1.1264*3.012, 77.275+1.1264*3.012)=(73.882, 80.668), which means the mean of E{Yh } will fall into the interval at a confidence level of 99%. (b) S^2{pred}=7.25+1.2688=8.5188, so s{pred}=2.9187, so the 99% PI is (77.275-2.9187*3.012, 77.275+2.9187*3.012)=(68.4839, 86.0661). Problem 6.25 Since 2β is known to be 4, the new model is Y’i=Yi-4Xi2= β 0+ β 1Xi1+ β 3Xi3+ε i. Problem 6.30 (a) Following are the step-and-leaf plots for each predictor variable Stem-and-Leaf Display: Age Stem-and-leaf of Age N = 113 Leaf Unit = 1.0 LO 38, 42 3 4 3 8 4 45555 10 4 77 21 4 88899999999 42 5 000000001111111111111 (24) 5 222222222222233333333333 47 5 4444444444455555 31 5 666666666666677777 13 5 8889999 6 6 01 4 6 23 2 6 4 HI 65 Stem-and-Leaf Display: risk Stem-and-leaf of risk N = 113 Leaf Unit = 0.10 LO 13, 13, 14 6 1 678 10 2 0013 20 2 5677899999 length 8004000 80400 20 15 10 800 400 0 bed risk 7.0 4.5 2.0 201510 80 40 0 7.04.52.0 service Matrix Plot of length, bed, risk, service Y X1 X2 X3 1.00000 0.41260 0.79452 0.35554 0.41260 1.00000 0.35977 0.53344 0.79452 0.35977 1.00000 0.40927 0.35554 0.53344 0.40927 1.00000 ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ We see from the above correlation plots Y has higher linear correlation with “number of beds” in model II than with “age” in model I. (c) For model I, Y=1.39 + 0.0837 X1+ 0.658 X2 + 0.0217 X3 For model I, Y=6.47 + 0.00302 X1 + 0.648 X2 - 0.0093 X3 Regression Analysis: length versus Age, risk, service The regression equation is length = 1.39 + 0.0837 Age + 0.658 risk + 0.0217 service Predictor Coef SE Coef T P Constant 1.386 1.866 0.74 0.459 Age 0.08371 0.03325 2.52 0.013 risk 0.6584 0.1214 5.43 0.000 service 0.02174 0.01071 2.03 0.045 S = 1.56840 R-Sq = 34.5% R-Sq(adj) = 32.7% Regression Analysis: length versus bed, risk, service The regression equation is length = 6.47 + 0.00302 bed + 0.648 risk - 0.0093 service Predictor Coef SE Coef T P Constant 6.4674 0.6152 10.51 0.000 bed 0.003018 0.001272 2.37 0.019 risk 0.6477 0.1219 5.31 0.000 service -0.00929 0.01652 -0.56 0.575 S = 1.57322 R-Sq = 34.1% R-Sq(adj) = 32.3% (d) For model I, R^2=0.345, for model I, R^2=0.341, model I is a little bit better than model II in terms of this measure. (e) For model I, the plots are: Fitted Value R es id ua l 13121110987 8 6 4 2 0 -2 -4 Residuals Versus the Fitted Values (response is length) Age R es id ua l 656055504540 8 6 4 2 0 -2 -4 Residuals Versus Age (response is length) risk R es id ua l 87654321 8 6 4 2 0 -2 -4 Residuals Versus risk (response is length) Residual Pe rc en t 7.55.02.50.0-2.5-5.0 99.9 99 95 90 80 70 60 50 40 30 20 10 5 1 0.1 Normal Probability Plot of the Residuals (response is length) For model II, the plots are: Fitted Value R es id ua l 13121110987 10 8 6 4 2 0 -2 -4 Residuals Versus the Fitted Values (response is length) bed R es id ua l 9008007006005004003002001000 10 8 6 4 2 0 -2 -4 Residuals Versus bed (response is length) risk R es id ua l 87654321 10 8 6 4 2 0 -2 -4 Residuals Versus risk (response is length) service R es id ua l 80706050403020100 10 8 6 4 2 0 -2 -4 Residuals Versus service (response is length) X1X2 R es id ua l 500040003000200010000 10 8 6 4 2 0 -2 -4 Residuals Versus X1X2 (response is length)