Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Material Type: Assignment; Class: Applied Regression Analysis; Subject: Statistics; University: Ohio State University - Main Campus; Term: Spring 2006;

Typology: Assignments

Pre 2010

1 / 19

Download Homework Solution for Chapter 6 - Applied Regression Analysis | STAT 645 and more Assignments Statistics in PDF only on Docsity! STAT 645 HOMEWORK SOLUTION FOR CHPATER 6 SPRING 2006 Problem 6.1 (a) X= ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ X41X42 X41 1 X31X32 X31 1 X21X22 X21 1 X11X12 X11 1 β= ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ 2 1 0 β β β (b) X= , ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ X42 X41 1 X32 X31 1 X22 X21 1 X12 X11 1 β= ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ 2 1 0 β β β Problem 6.5 (a) The correlation matrix is Y X1 X2 ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ 1.00000 0.00000 0.39458 0.00000 1.00000 0.89239 0.39458 0.89239 1.00000 Y 100 80 60 10.07.55.0 X1 10.0 7.5 5.0 1008060 4 3 2 X2 432 Matrix Plot of Y, X1, X2 (b) Regression Analysis: Y versus X1, X2 The regression equation is Y = 37.7 + 4.42 X1 + 4.38 X2 Predictor Coef SE Coef T P Constant 37.650 2.996 12.57 0.000 X1 4.4250 0.3011 14.70 0.000 X2 4.3750 0.6733 6.50 0.000 S = 2.69330 R-Sq = 95.2% R-Sq(adj) = 94.5% b1=4.4250 means the change in the mean response per unit increase in X1 (moisture content) while X2 (sweetness) is held constant. (c) R ES I1 5 4 3 2 1 0 -1 -2 -3 -4 Boxplot of RESI1 This box plot shows that the residuals are symmetrically distributed around 0 and are located between -2 and 2. (d) Following are the residual plots against various variables. Residual Pe rc en t 5.02.50.0-2.5-5.0-7.5 99 95 90 80 70 60 50 40 30 20 10 5 1 Normal Probability Plot of the Residuals (response is Y) The above normal probability plot shows that the residuals has a normal distribution. (e) H0:γ 1=γ 2=0 vs Ha: at least one of γ 1 and γ 2 is not 0. SSE=94.3, SSR*=72.41, so XBP^2=(SSR*/2)/(SSE/n)^2=1.04< (0.99, 2)=9.21 2χ so we conclude H0, i.e. the error variance is constant. (f) H0: E{Y}=β0+β1X1+β2X2 Ha: E{Y} ≠ β0+β1X1+β2X2 From MINITAB, we have Analysis of Variance Source DF SS MS F P Regression 2 1872.70 936.35 129.08 0.000 Residual Error 13 94.30 7.25 Lack of Fit 5 37.30 7.46 1.05 0.453 Pure Error 8 57.00 7.13 Total 15 1967.00 c=4*2=8, p=3, F*=1.05<F(0.99;5,8)=6.63, so we conclude H0. Problem 6.6 (a) H0: 1β = 2β =0 against H1: not both 1β and 2β equal 0. From MINITAB we obtain Analysis of Variance Source DF SS MS F P Regression 2 1872.70 936.35 129.08 0.000 Residual Error 13 94.30 7.25 Total 15 1967.00 Note F*=936.35/7.25=129.083> F(0.99;2,13)=6.70, so we conclude H1, which implies at least one of 1β and 2β is not 0. (b) P-value is 0.000+. (c) B=t(1-0.01/4;13)= 3.372, s{b1}=0.3011, s{b2}=0.6733 So for 1β the 99% CI is (4.425- 3.372*0.3011, 4.425+ 3.372*0.3011)=(3.4097, 5.4403) For 2β , the 99% CI is (4.375- 3.372*0.6733, 4.375+ 3.372*0.6733)=(2.1046, 6.6454) Problem 6.7 (a) R^2=1872.20/1967.00=0.9518, which measures the proportionate reduction of total variation in Y associated with the use the set of X variables X1 and X2. (b) Yes, they are the same. Problem 6.8 (You can get the results directly through MINITAB) (a) We define Xh=[1, 5, 4]’, Yh^=77.275, also note t(0.995;13)=3.012 (X’X)^-1= , ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ 0.0625 0.0000 0.1875- 0.0000 0.0125 0.0875- 0.1875- 0.0875- 1.2375 so s^2{Yh^}=7.25*[1, 5, 4] [1, 5, 4]’=1.2688, so s{ Y ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎣ ⎡ 0.0625 0.0000 0.1875- 0.0000 0.0125 0.0875- 0.1875- 0.0875- 1.2375 h^}=1.1264 so the 99% CI for E{Yh} is (77.275-1.1264*3.012, 77.275+1.1264*3.012)=(73.882, 80.668), which means the mean of E{Yh } will fall into the interval at a confidence level of 99%. (b) S^2{pred}=7.25+1.2688=8.5188, so s{pred}=2.9187, so the 99% PI is (77.275-2.9187*3.012, 77.275+2.9187*3.012)=(68.4839, 86.0661). Problem 6.25 Since 2β is known to be 4, the new model is Y’i=Yi-4Xi2= β 0+ β 1Xi1+ β 3Xi3+ε i. Problem 6.30 (a) Following are the step-and-leaf plots for each predictor variable Stem-and-Leaf Display: Age Stem-and-leaf of Age N = 113 Leaf Unit = 1.0 LO 38, 42 3 4 3 8 4 45555 10 4 77 21 4 88899999999 42 5 000000001111111111111 (24) 5 222222222222233333333333 47 5 4444444444455555 31 5 666666666666677777 13 5 8889999 6 6 01 4 6 23 2 6 4 HI 65 Stem-and-Leaf Display: risk Stem-and-leaf of risk N = 113 Leaf Unit = 0.10 LO 13, 13, 14 6 1 678 10 2 0013 20 2 5677899999 length 8004000 80400 20 15 10 800 400 0 bed risk 7.0 4.5 2.0 201510 80 40 0 7.04.52.0 service Matrix Plot of length, bed, risk, service Y X1 X2 X3 1.00000 0.41260 0.79452 0.35554 0.41260 1.00000 0.35977 0.53344 0.79452 0.35977 1.00000 0.40927 0.35554 0.53344 0.40927 1.00000 ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎢ ⎢ ⎣ ⎡ We see from the above correlation plots Y has higher linear correlation with “number of beds” in model II than with “age” in model I. (c) For model I, Y=1.39 + 0.0837 X1+ 0.658 X2 + 0.0217 X3 For model I, Y=6.47 + 0.00302 X1 + 0.648 X2 - 0.0093 X3 Regression Analysis: length versus Age, risk, service The regression equation is length = 1.39 + 0.0837 Age + 0.658 risk + 0.0217 service Predictor Coef SE Coef T P Constant 1.386 1.866 0.74 0.459 Age 0.08371 0.03325 2.52 0.013 risk 0.6584 0.1214 5.43 0.000 service 0.02174 0.01071 2.03 0.045 S = 1.56840 R-Sq = 34.5% R-Sq(adj) = 32.7% Regression Analysis: length versus bed, risk, service The regression equation is length = 6.47 + 0.00302 bed + 0.648 risk - 0.0093 service Predictor Coef SE Coef T P Constant 6.4674 0.6152 10.51 0.000 bed 0.003018 0.001272 2.37 0.019 risk 0.6477 0.1219 5.31 0.000 service -0.00929 0.01652 -0.56 0.575 S = 1.57322 R-Sq = 34.1% R-Sq(adj) = 32.3% (d) For model I, R^2=0.345, for model I, R^2=0.341, model I is a little bit better than model II in terms of this measure. (e) For model I, the plots are: Fitted Value R es id ua l 13121110987 8 6 4 2 0 -2 -4 Residuals Versus the Fitted Values (response is length) Age R es id ua l 656055504540 8 6 4 2 0 -2 -4 Residuals Versus Age (response is length) risk R es id ua l 87654321 8 6 4 2 0 -2 -4 Residuals Versus risk (response is length) Residual Pe rc en t 7.55.02.50.0-2.5-5.0 99.9 99 95 90 80 70 60 50 40 30 20 10 5 1 0.1 Normal Probability Plot of the Residuals (response is length) For model II, the plots are: Fitted Value R es id ua l 13121110987 10 8 6 4 2 0 -2 -4 Residuals Versus the Fitted Values (response is length) bed R es id ua l 9008007006005004003002001000 10 8 6 4 2 0 -2 -4 Residuals Versus bed (response is length) risk R es id ua l 87654321 10 8 6 4 2 0 -2 -4 Residuals Versus risk (response is length) service R es id ua l 80706050403020100 10 8 6 4 2 0 -2 -4 Residuals Versus service (response is length) X1X2 R es id ua l 500040003000200010000 10 8 6 4 2 0 -2 -4 Residuals Versus X1X2 (response is length)