



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
These are the important key points of solved assignment of Applied Regression Analysis are: Straight Line Model, Output, Transformation, Appropriateness of the Straight Line, Least-Square Estimate, Computer Output, Residuals Versus of Predicted Values, Observations, Normality Assumption, Serious Problem
Typology: Exercises
1 / 7
This page cannot be seen from the preview
Don't miss anything!




Question #1:
(a) The plot of Y versus X is given on page 1 of SAS output. It is clear that a straight line model is not adequate. By looking at the graph, it seems that the transformation (^) X^1 or − ln(X) is more appropriate.
(b) The plot of Y 1 =
Y versus X is given on page 3 of SAS output. In addition, for the model Y 1 = β 0 + β 1 X, we have R^2 = 0.9687 and M SRes = 0.044474. Recall that R^2 is not a measure of the appropriateness of the straight line model. By looking at the graph, it is clear that a straight line model is not adequate. In fact, it still seems that the transformation (^) X^1 or − ln(X) is necessary.
(c) Using the computer output on page 2, the least-square estimate of the re- gression line when Y 1 is regressed on X is given by
Yˆ 1 = 6. 37974 − 0. 02777 X
The plot of residuals versus of predicted values is given on page 4 of the SAS output. The plot implies that the variance of residuals is not constant and, in fact, the variable X^2 should be added to the model. We will add X^2 to the model in part (d).
(d) Based on the computer output on page 5, the least-square estimate of the regression model is:
Yˆ 1 = 7. 00263 − 0. 05069 X + 0. 00014861 X^2
(i) The plot of residuals versus predicted values is given on page 8 of SAS output. It seems that the variance of residuals is constant specially if we delete two observations. One of these observations seems to be an outlier.
(ii) The Q-Q-plot of residuals is given on page 9 of SAS output. It seems that there is no serious problem with normality assumption specially if we delete the outlier.
(iii) The R-student residual for observation 17 is -3.1179. Since | − 3. 1179 | > 3, observation of 17 is an outlier. The first observation also have a large R-student residual, 2.4488. Observations 5 and 12 are leverage points since they have
hii > 2(p)/n = 2(3)/24 = 0. 25.
Note the R-student residuals for these observations are less than 2. In addition,
1 + 3(p)/n = 1 + 3(3)/24 = 1. 375 and 1 − 3(p)/n = 1 − 3(3)/24 = 0. 625
Therefore, any observation which has a covariance ratio less than 0.625 or greater than 1.375, is considered as an influential observation. The covariance ratio for observation 17 (the outlier) is 0.4063, therefore this observation is an influential observation, based on covariance ratio. The covariance ratios for observations 5 and 12 are 1.5674 and 1.7504, respectively, therefore, these observations are influential observations.
Any observation with |DFFITS| > 2
p/n = 2
3 /24 = 0.707 is considered as an influential observation. Therefore, observations 1 and 17 are considered as influential observations based of DFFITS.
Any observation with |DFBETAS| > 2 /
n = 2/
24 = 0.408 is considered as an influential observations. The absolute value of DFBETAS for the first observation for intercept, X and X^2 are all greater than 0.408. Therefore, this observation is an influential observation. The absolute value of DFBETAS for observation 17 for intercept and X are greater than 0.408. Therefore, this observation is an influential observation too. Note that the absolute value of DFBETA for X^2 for observation 4 is also greater than 0.408. Note that this observation is not an influential observation based on other influential statistics. In addition, the R-student and hii is not large for this observation. Therefore, we do not consider this observation as an influential observation.
As a summary, observation 17 is an outlier and it is definitely an influential observation. Observation 1 has large R-student and it can be considered as an influential observation by many influential statistics. It does not seems that ob- servation 4 and 12 are influential observations although they are leverage points. Then observation 17 and probably the first observation seem bothersome.
(iv) The collinearity diagnostics are given on page 5 of the SAS output. Since all condition indexes are less than 30, there is no collinearity problem. (Note that the variance proportions for intercept, X, and X^2 are greater than 0.5, but since the condition index is 27. 00782 < 30, we do not use variance proportions in this case. It is a good idea to find variance inflation factor (VIF) too. Recall that VIF < 10 means no collinearity problem.)
Question #2: Since there are only few females in this problem, the model is not full rank. In fact, there is prefect collinearity problem. SAS deleted the last variable FEMAGE and solve problem for the other variables. Some statistical packages delete the first variable and solve the problem. Therefore, you might get different result.
terms. This is confirmed by VIF, since the variance inflation factors are greater than 10 for female and its interaction terms. By part (d), if we delete intercept, there is no collinearity for height, but there is still collinearity problem between female and its interaction terms.
(f ) The residuals and influential statistics are given in page 14 of SAS output. All studentized residuals are less than 3 in absolute value. Therefore, no obser- vation is an outlier. The studentized residual for observation 14 is -2.263 which is greater than 2 in absolute value. For influential statistics, we should use p = 7 since there are seven parameters in the model (Note that Female*Age is deleted from the model by SAS). Complete this part!!!.
(g) Yes. There is a collinearity problem between Female and its interaction terms. There is a collinearity problem between height and intercept when in- tercept is included in the model.
(h) The results for the regression model when female is deleted are given on pages 17 to 20. The variance inflation factors for height, weight, and age are 1.10811, 1.08024, and 1.13485, respectively. Since all of them are less than 10, there is no problem for collinearity for these variables. The value of VIF for intercept is not known (It is given 0 in the table, but the value clearly is not zero). Hence, we have to use condition indexes.
The condition index for the last row is 76. 57184 > 30. Therefore, there is a moderate collinearity problem. Since the variance proportion for intercept and height are greater than 0.5 for last row (0.9617 and 0.97021), there is a collinearity problem between intercept and height. In the next table, all con- dition numbers are less than 30 when we deleted intercept. Therefore, there would be no collinearity problem if we delete intercept too.
Question #3:
(a) The least-square estimate of the regression model is:
Yˆ = − 27 .90029 + 5. 21614 X 1 + 5. 62214 X 2 − 0. 29286 X 12 − 0. 13857 X 22 − 0. 00550 X 1 X 2
(b) The plot of Y versus the predicted value is given on page 22 of SAS output. Note that if the model is good, the predicted value should be close to observation Y. Therefore, a straight line which goes through origin should be fitted very well to the data on the plot of Y versus predicted value. The plot on page 22 implies that this is true for the fitted model, except for one observation. Hence,
the model fit the data very well. Note that R^2 for this model is 0.9428. This means that 94.28 percent of variation in Y is explained by the model.
(c) Let
Full Model : Y = β 0 + β 1 X 1 + β 2 X 2 + β 11 X 12 + β 22 X^22 + β 12 X 1 X 2 + E
Reduced Model : Y = β 0 + β 1 X 1 + β 2 X 2 + β 11 X 12 + β 22 X^22 + E
The test statistic is:
M SRes(X 1 , X 2 , X 12 , X 22 , X 1 X 2 )
=
Since 0. 017 < F 0. 05 , 1 , 19 = 4.38, we fail to reject H 0 : β 12 = 0. The test statistic value and its P -value are given on page 24 of SAS output. You can use type I or type III SS. Both are the same and the F -values calculated by SAS are the correct F -values in this case since the full model is also the largest model. Using the output, we have P -value = P (T S > 0 .02) = 0.8987. Since P − value > 0 .05, we fail to reject H 0 : β 12 = 0. In addition, for testing H 0 : β 12 = 0, we can also use a t-test. The test statistic is
βˆ 12 − 0 S.E( βˆ 12 )
Since | − 0. 13 | < t 0. 025 ,df =19 = 2.093, we fail to reject H 0. Again the test value and its P -value are given on the SAS output on page 21. The P − value = 2 P (T S > 0 .13) = 0.8987. Since P − value > 0 .05, we fail to reject H 0. (Note that the square of t-value is F -value.)
(d) For comparing
Full Model : Y = β 0 + β 1 X 1 + β 2 X 2 + β 11 X 12 + β 22 X 22 + E
Reduced Model : Y = β 0 + β 1 X 1 + β 2 X 2 + β 11 X 12 + E
the test statistic is:
M SRes(X 1 , X 2 , X 12 , X 22 )
=
Since 1. 94 < F 0. 05 , 1 , 20 = 4.35, we fail to reject H 0 : β 22 = 0. The test statistic value and its P -value are given on page 27 of SAS output. Again you can
observation, based on covariance ratio. The covariance ratios for observations 1, and 5 are 1.5742, and 1.5412, respectively, therefore, these observations are influential observations.
Any observation with |DFFITS| > 2
p/n = 2
4 /25 = 0.8 is considered as an influential observation. Therefore, observations 4 and 24 are considered as influential observations based of DFFITS.
Any observation with |DFBETAS| > 2 /
n = 2/
25 = 0.4 is considered as an influential observations. The absolute value of DFBETAS for the observation 24 for X 1 , X 2 and X 12 are all greater than 0.4. Therefore, this observation is an influential observation. The absolute value of DFBETAS for observation 4 for X 1 and X 12 are greater than 0.4. Therefore, this observation is an influential observation too. Note that the absolute value of DFBETA for intercept for observation 2 is also greater than 0.4. Note that this observation is not an influential observation based on other influential statistics. In addition, the R- student and hii is not large for this observation. Therefore, we do not consider this observation as an influential observation.
As a summary, observation 24 is an outlier and it is definitely an influential observation. Observation 4 has large R-student and it can be considered as an influential observation by many influential statistics. It does not seems that observation 1, 5 are influential observations. Hence it seems that observations 4 and 24 are bothersome.