Stat/For/Hort 572: Midterm I Solutions - Spring 2006, Exams of Data Analysis & Statistical Methods

Solutions to the stat/for/hort 572 midterm i exam held in spring 2006. It covers topics such as model assumptions, lack of fit tests, confidence intervals, hypothesis testing, and regression analysis. R output and explanations for various statistical concepts.

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-u7q
koofers-user-u7q 🇺🇸

10 documents

1 / 1

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Stat/For/Hort 572 Midterm I, Spring 2006 Solutions
1. (a) Consider the four model assumptions: correct model, independence, equal variance, and normal distri-
bution. The linear line relationship app ears inadequate and the equal variance assumption may not be
satisfied.
(b) H0: no LOF versus HA: LOF of the SLR model. From the R output, the observed f= 9.0348. Compare
to Fdistribution with df = (12, 14), the p-value is less than 0.001. Thus reject H0at 5% level and there
is very strong evidence of a lack of fit of the SLR model.
(c) From the R output, for x= 18, ˆyest = 6.76, s.e.yest) = 0.7084. Since t0.025,26 = 2.056, the 95%
confidence interval is ˆyest ±t0.025,26 ×s.e.yest), which is 6.76 ±1.46 or [5.30,8.22].
(d) H0: no LOF versus HA: LOF of the quadratic regression model. Since SS Pure Error is 18.601 on 14
df and SS Error is 126.64 on 25 df, SS LOF is 126.64 18.601 = 108.039 on 25 14 = 11 df. By the
additional sum of squares principle, the observed f=108.039/11
18.60/14 = 7.39. Compare to Fdistribution with
df = (11, 14), the p-value is less than 0.001. Thus reject H0at 5% level and there is very strong evidence
of a lack of fit of the quadratic regression model.
2. (a) The parameter b3is the slope difference between the two regression lines corresponding to farms A and
B respectively. From the R output, use either the observed t=0.608 on 6 df or f= 0.3694 on (1, 6) df.
The p-value is 0.5656. Do not reject H0at 5% level and there is no evidence of a nonzero b3.
(b) The parameter b0is the intercept of the regression line for farm A, which represents the expected weight
gain for a zero level of diet supplement. Since ˆ
b0= 1.888, s.e.(ˆ
b0) = 0.3299, t0.05,6= 1.943, the 90%
confidence interval is ˆ
b0±t0.05,6×s.e.(ˆ
b0), which is 1.89 ±0.64 or [1.25,2.53].
(c) The test of interest is H0: [b2=b3= 0|b0, b1] versus HA: not H0. The additional sum of squares is
10.04 + 0.0106 = 10.0506 on 2 df and the SSE of the full model is 0.1718 on 6 df. By the additional sum
of squares principle, the observed f=10.0506/2
0.1718/6= 175.51. Compare to Fdistribution with df = (2, 6),
the p-value is less than 0.001. Thus reject H0at 5% level and there is very strong evidence that the two
regression lines are not equal.
(d) The full model is y=b0+b1w1+b2w2+ewhich has SSE = 0.0106 + 0.1718 = 0.1824 on 7 df. The
reduced model is y=b0+b1w1+ewith additional sum of squares 10.04 on 1 df. By the additional sum
of squares principle, the observed f=10.04/1
0.1824/7= 385.31. Compare to Fdistribution with df = (1, 7), the
p-value is less than 0.001. Thus reject H0at 5% level and there is very strong evidence of a nonzero b2.
That is, although there is no evidence of slope difference, there is strong evidence of intercept difference
between the two regression lines for farms A and B.
3. (a) From the R output, the correlation between yand each individual xis the highest for x3. This implies
that the third model y=b0+b1x3+ehas the largest R2and thus the smallest SSE. The df of SSE is
the same for all three models. Thus the third model has the smallest MSE.
(b) H0: the third observation is not an outlier versus HA: not H0. Because of the way x4is coded, use the
observed t-value 2.983 on 7 df. The p-value is 0.02043 for one comparison and thus the exp eriment-wise
p-value is 12 ×0.02043 = 0.2452. Do not reject H0at 5% level and there is no evidence that the third
observation is an outlier.
(c) According to the full model fit, the t-value for b1(1.885) is the smallest and is less than 2. Thus eliminate
x1is the first step. Now fit the model with x2and x3. Since the smallest t-value is 16.905 and is more
than 2, stop. The model selected by backward elimination is
y=b0+b2x2+b3x3+e
Grade Distribution
100:2
90-99:15
80-89:18
70-79:17 mean = 78, median = 80
60-69:3
<60:10
1

Partial preview of the text

Download Stat/For/Hort 572: Midterm I Solutions - Spring 2006 and more Exams Data Analysis & Statistical Methods in PDF only on Docsity!

Stat/For/Hort 572 — Midterm I, Spring 2006 — Solutions

  1. (a) Consider the four model assumptions: correct model, independence, equal variance, and normal distri- bution. The linear line relationship appears inadequate and the equal variance assumption may not be satisfied. (b) H 0 : no LOF versus HA : LOF of the SLR model. From the R output, the observed f = 9.0348. Compare to F distribution with df = (12, 14), the p-value is less than 0.001. Thus reject H 0 at 5% level and there is very strong evidence of a lack of fit of the SLR model. (c) From the R output, for x∗^ = 18, ˆyest = 6. 76 , s.e.(ˆyest) = 0.7084. Since t 0. 025 , 26 = 2.056, the 95% confidence interval is ˆyest ± t 0. 025 , 26 × s.e.(ˆyest), which is 6. 76 ± 1 .46 or [5. 30 , 8 .22]. (d) H 0 : no LOF versus HA : LOF of the quadratic regression model. Since SS Pure Error is 18.601 on 14 df and SS Error is 126.64 on 25 df, SS LOF is 126. 64 − 18 .601 = 108.039 on 25 − 14 = 11 df. By the additional sum of squares principle, the observed f = 10818 ..^03960 // 1411 = 7.39. Compare to F distribution with df = (11, 14), the p-value is less than 0.001. Thus reject H 0 at 5% level and there is very strong evidence of a lack of fit of the quadratic regression model.
  2. (a) The parameter b 3 is the slope difference between the two regression lines corresponding to farms A and B respectively. From the R output, use either the observed t = − 0 .608 on 6 df or f = 0.3694 on (1, 6) df. The p-value is 0.5656. Do not reject H 0 at 5% level and there is no evidence of a nonzero b 3. (b) The parameter b 0 is the intercept of the regression line for farm A, which represents the expected weight gain for a zero level of diet supplement. Since ˆb 0 = 1. 888 , s.e.(ˆb 0 ) = 0. 3299 , t 0. 05 , 6 = 1.943, the 90% confidence interval is ˆb 0 ± t 0. 05 , 6 × s.e.(ˆb 0 ), which is 1. 89 ± 0 .64 or [1. 25 , 2 .53]. (c) The test of interest is H 0 : [b 2 = b 3 = 0|b 0 , b 1 ] versus HA: not H 0. The additional sum of squares is 10 .04 + 0.0106 = 10.0506 on 2 df and the SSE of the full model is 0.1718 on 6 df. By the additional sum of squares principle, the observed f = (^100) .. 17180506 // 62 = 175.51. Compare to F distribution with df = (2, 6), the p-value is less than 0.001. Thus reject H 0 at 5% level and there is very strong evidence that the two regression lines are not equal. (d) The full model is y = b 0 + b 1 w 1 + b 2 w 2 + e which has SSE = 0.0106 + 0.1718 = 0.1824 on 7 df. The reduced model is y = b 0 + b 1 w 1 + e with additional sum of squares 10.04 on 1 df. By the additional sum of squares principle, the observed f = (^010). 1824.^04 //^17 = 385.31. Compare to F distribution with df = (1, 7), the p-value is less than 0.001. Thus reject H 0 at 5% level and there is very strong evidence of a nonzero b 2. That is, although there is no evidence of slope difference, there is strong evidence of intercept difference between the two regression lines for farms A and B.
  3. (a) From the R output, the correlation between y and each individual x is the highest for x 3. This implies that the third model y = b 0 + b 1 x 3 + e has the largest R^2 and thus the smallest SSE. The df of SSE is the same for all three models. Thus the third model has the smallest MSE. (b) H 0 : the third observation is not an outlier versus HA: not H 0. Because of the way x 4 is coded, use the observed t-value 2.983 on 7 df. The p-value is 0.02043 for one comparison and thus the experiment-wise p-value is 12 × 0 .02043 = 0.2452. Do not reject H 0 at 5% level and there is no evidence that the third observation is an outlier. (c) According to the full model fit, the t-value for b 1 (1.885) is the smallest and is less than 2. Thus eliminate x 1 is the first step. Now fit the model with x 2 and x 3. Since the smallest t-value is 16.905 and is more than 2, stop. The model selected by backward elimination is

y = b 0 + b 2 x 2 + b 3 x 3 + e

Grade Distribution

100: 90-99: 80-89: 70-79:17 mean = 78, median = 80 60-69: <60: