Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Straight Line Model - Applied Regression Analysis - Solved Assignment, Exercises of Mathematical Statistics

Anna University Mathematical Statistics

These are the important key points of solved assignment of Applied Regression Analysis are: Straight Line Model, Output, Transformation, Appropriateness of the Straight Line, Least-Square Estimate, Computer Output, Residuals Versus of Predicted Values, Observations, Normality Assumption, Serious Problem

Typology: Exercises

2012/2013

Uploaded on 01/11/2013

m-alam 🇮🇳

4.7

(12)

54 documents

1 / 7

This page cannot be seen from the preview

Don't miss anything!

Question #1:

(a) The plot of Yversus Xis given on page 1 of SAS output. It is clear that a

straight line model is not adequate. By looking at the graph, it seems that the

transformation 1

Xor −ln(X) is more appropriate.

(b) The plot of Y1 = √Yversus Xis given on page 3 of SAS output. In addition,

for the model Y1 = β0+β1X, we have R2= 0.9687 and MSRes = 0.044474.

Recall that R2is not a measure of the appropriateness of the straight line model.

By looking at the graph, it is clear that a straight line model is not adequate.

In fact, it still seems that the transformation 1

Xor −ln(X) is necessary.

(c) Using the computer output on page 2, the least-square estimate of the re-

gression line when Y1 is regressed on Xis given by

ˆ

Y1 = 6.37974 −0.02777X

The plot of residuals versus of predicted values is given on page 4 of the SAS

output. The plot implies that the variance of residuals is not constant and, in

fact, the variable X2should be added to the model. We will add X2to the

model in part (d).

(d) Based on the computer output on page 5, the least-square estimate of the

regression model is:

ˆ

Y1 = 7.00263 −0.05069 X+ 0.00014861 X2

(i) The plot of residuals versus predicted values is given on page 8 of SAS

output. It seems that the variance of residuals is constant specially if we delete

two observations. One of these observations seems to be an outlier.

(ii) The Q-Q-plot of residuals is given on page 9 of SAS output. It seems that

there is no serious problem with normality assumption specially if we delete the

outlier.

(iii) The R-student residual for observation 17 is -3.1179. Since |−3.1179|>3,

observation of 17 is an outlier. The first observation also have a large R-student

residual, 2.4488. Observations 5 and 12 are leverage points since they have

hii >2(p)/n = 2(3)/24 = 0.25.

1

Docsity.com

Discover Exercises of Mathematical Statistics Anna University

Partial preview of the text

Download Straight Line Model - Applied Regression Analysis - Solved Assignment and more Exercises Mathematical Statistics in PDF only on Docsity!

Question #1:

(a) The plot of Y versus X is given on page 1 of SAS output. It is clear that a straight line model is not adequate. By looking at the graph, it seems that the transformation (^) X^1 or − ln(X) is more appropriate.

(b) The plot of Y 1 =

Y versus X is given on page 3 of SAS output. In addition, for the model Y 1 = β 0 + β 1 X, we have R^2 = 0.9687 and M SRes = 0.044474. Recall that R^2 is not a measure of the appropriateness of the straight line model. By looking at the graph, it is clear that a straight line model is not adequate. In fact, it still seems that the transformation (^) X^1 or − ln(X) is necessary.

(c) Using the computer output on page 2, the least-square estimate of the re- gression line when Y 1 is regressed on X is given by

Yˆ 1 = 6. 37974 − 0. 02777 X

The plot of residuals versus of predicted values is given on page 4 of the SAS output. The plot implies that the variance of residuals is not constant and, in fact, the variable X^2 should be added to the model. We will add X^2 to the model in part (d).

(d) Based on the computer output on page 5, the least-square estimate of the regression model is:

Yˆ 1 = 7. 00263 − 0. 05069 X + 0. 00014861 X^2

(i) The plot of residuals versus predicted values is given on page 8 of SAS output. It seems that the variance of residuals is constant specially if we delete two observations. One of these observations seems to be an outlier.

(ii) The Q-Q-plot of residuals is given on page 9 of SAS output. It seems that there is no serious problem with normality assumption specially if we delete the outlier.

(iii) The R-student residual for observation 17 is -3.1179. Since | − 3. 1179 | > 3, observation of 17 is an outlier. The first observation also have a large R-student residual, 2.4488. Observations 5 and 12 are leverage points since they have

hii > 2(p)/n = 2(3)/24 = 0. 25.

Note the R-student residuals for these observations are less than 2. In addition,

1 + 3(p)/n = 1 + 3(3)/24 = 1. 375 and 1 − 3(p)/n = 1 − 3(3)/24 = 0. 625

Therefore, any observation which has a covariance ratio less than 0.625 or greater than 1.375, is considered as an influential observation. The covariance ratio for observation 17 (the outlier) is 0.4063, therefore this observation is an influential observation, based on covariance ratio. The covariance ratios for observations 5 and 12 are 1.5674 and 1.7504, respectively, therefore, these observations are influential observations.

Any observation with |DFFITS| > 2

p/n = 2

3 /24 = 0.707 is considered as an influential observation. Therefore, observations 1 and 17 are considered as influential observations based of DFFITS.

Any observation with |DFBETAS| > 2 /

n = 2/

24 = 0.408 is considered as an influential observations. The absolute value of DFBETAS for the first observation for intercept, X and X^2 are all greater than 0.408. Therefore, this observation is an influential observation. The absolute value of DFBETAS for observation 17 for intercept and X are greater than 0.408. Therefore, this observation is an influential observation too. Note that the absolute value of DFBETA for X^2 for observation 4 is also greater than 0.408. Note that this observation is not an influential observation based on other influential statistics. In addition, the R-student and hii is not large for this observation. Therefore, we do not consider this observation as an influential observation.

As a summary, observation 17 is an outlier and it is definitely an influential observation. Observation 1 has large R-student and it can be considered as an influential observation by many influential statistics. It does not seems that ob- servation 4 and 12 are influential observations although they are leverage points. Then observation 17 and probably the first observation seem bothersome.

(iv) The collinearity diagnostics are given on page 5 of the SAS output. Since all condition indexes are less than 30, there is no collinearity problem. (Note that the variance proportions for intercept, X, and X^2 are greater than 0.5, but since the condition index is 27. 00782 < 30, we do not use variance proportions in this case. It is a good idea to find variance inflation factor (VIF) too. Recall that VIF < 10 means no collinearity problem.)

Question #2: Since there are only few females in this problem, the model is not full rank. In fact, there is prefect collinearity problem. SAS deleted the last variable FEMAGE and solve problem for the other variables. Some statistical packages delete the first variable and solve the problem. Therefore, you might get different result.

terms. This is confirmed by VIF, since the variance inflation factors are greater than 10 for female and its interaction terms. By part (d), if we delete intercept, there is no collinearity for height, but there is still collinearity problem between female and its interaction terms.

(f ) The residuals and influential statistics are given in page 14 of SAS output. All studentized residuals are less than 3 in absolute value. Therefore, no obser- vation is an outlier. The studentized residual for observation 14 is -2.263 which is greater than 2 in absolute value. For influential statistics, we should use p = 7 since there are seven parameters in the model (Note that Female*Age is deleted from the model by SAS). Complete this part!!!.

(g) Yes. There is a collinearity problem between Female and its interaction terms. There is a collinearity problem between height and intercept when in- tercept is included in the model.

(h) The results for the regression model when female is deleted are given on pages 17 to 20. The variance inflation factors for height, weight, and age are 1.10811, 1.08024, and 1.13485, respectively. Since all of them are less than 10, there is no problem for collinearity for these variables. The value of VIF for intercept is not known (It is given 0 in the table, but the value clearly is not zero). Hence, we have to use condition indexes.

The condition index for the last row is 76. 57184 > 30. Therefore, there is a moderate collinearity problem. Since the variance proportion for intercept and height are greater than 0.5 for last row (0.9617 and 0.97021), there is a collinearity problem between intercept and height. In the next table, all con- dition numbers are less than 30 when we deleted intercept. Therefore, there would be no collinearity problem if we delete intercept too.

Question #3:

(a) The least-square estimate of the regression model is:

Yˆ = − 27 .90029 + 5. 21614 X 1 + 5. 62214 X 2 − 0. 29286 X 12 − 0. 13857 X 22 − 0. 00550 X 1 X 2

(b) The plot of Y versus the predicted value is given on page 22 of SAS output. Note that if the model is good, the predicted value should be close to observation Y. Therefore, a straight line which goes through origin should be fitted very well to the data on the plot of Y versus predicted value. The plot on page 22 implies that this is true for the fitted model, except for one observation. Hence,

the model fit the data very well. Note that R^2 for this model is 0.9428. This means that 94.28 percent of variation in Y is explained by the model.

(c) Let

Full Model : Y = β 0 + β 1 X 1 + β 2 X 2 + β 11 X 12 + β 22 X^22 + β 12 X 1 X 2 + E

Reduced Model : Y = β 0 + β 1 X 1 + β 2 X 2 + β 11 X 12 + β 22 X^22 + E

The test statistic is:

T S =

[

SSR(X 1 , X 2 , X^21 , X 22 , X 1 X 2 ) − SSR(X 1 , X 2 , X 12 , X 22 )

]

M SRes(X 1 , X 2 , X 12 , X 22 , X 1 X 2 )

=

Since 0. 017 < F 0. 05 , 1 , 19 = 4.38, we fail to reject H 0 : β 12 = 0. The test statistic value and its P -value are given on page 24 of SAS output. You can use type I or type III SS. Both are the same and the F -values calculated by SAS are the correct F -values in this case since the full model is also the largest model. Using the output, we have P -value = P (T S > 0 .02) = 0.8987. Since P − value > 0 .05, we fail to reject H 0 : β 12 = 0. In addition, for testing H 0 : β 12 = 0, we can also use a t-test. The test statistic is

T S =

βˆ 12 − 0 S.E( βˆ 12 )

Since | − 0. 13 | < t 0. 025 ,df =19 = 2.093, we fail to reject H 0. Again the test value and its P -value are given on the SAS output on page 21. The P − value = 2 P (T S > 0 .13) = 0.8987. Since P − value > 0 .05, we fail to reject H 0. (Note that the square of t-value is F -value.)

(d) For comparing

Full Model : Y = β 0 + β 1 X 1 + β 2 X 2 + β 11 X 12 + β 22 X 22 + E

Reduced Model : Y = β 0 + β 1 X 1 + β 2 X 2 + β 11 X 12 + E

the test statistic is:

T S =

[

SSR(X 1 , X 2 , X 12 , X 22 ) − SSR(X 1 , X 2 , X 12 )

]

M SRes(X 1 , X 2 , X 12 , X 22 )

=

Since 1. 94 < F 0. 05 , 1 , 20 = 4.35, we fail to reject H 0 : β 22 = 0. The test statistic value and its P -value are given on page 27 of SAS output. Again you can

observation, based on covariance ratio. The covariance ratios for observations 1, and 5 are 1.5742, and 1.5412, respectively, therefore, these observations are influential observations.

Any observation with |DFFITS| > 2

p/n = 2

4 /25 = 0.8 is considered as an influential observation. Therefore, observations 4 and 24 are considered as influential observations based of DFFITS.

Any observation with |DFBETAS| > 2 /

n = 2/

25 = 0.4 is considered as an influential observations. The absolute value of DFBETAS for the observation 24 for X 1 , X 2 and X 12 are all greater than 0.4. Therefore, this observation is an influential observation. The absolute value of DFBETAS for observation 4 for X 1 and X 12 are greater than 0.4. Therefore, this observation is an influential observation too. Note that the absolute value of DFBETA for intercept for observation 2 is also greater than 0.4. Note that this observation is not an influential observation based on other influential statistics. In addition, the R- student and hii is not large for this observation. Therefore, we do not consider this observation as an influential observation.

As a summary, observation 24 is an outlier and it is definitely an influential observation. Observation 4 has large R-student and it can be considered as an influential observation by many influential statistics. It does not seems that observation 1, 5 are influential observations. Hence it seems that observations 4 and 24 are bothersome.

Straight Line Model - Applied Regression Analysis - Solved Assignment, Exercises of Mathematical Statistics

Related documents

Partial preview of the text

Download Straight Line Model - Applied Regression Analysis - Solved Assignment and more Exercises Mathematical Statistics in PDF only on Docsity!

T S =

[

SSR(X 1 , X 2 , X^21 , X 22 , X 1 X 2 ) − SSR(X 1 , X 2 , X 12 , X 22 )

]

T S =

T S =

[

SSR(X 1 , X 2 , X 12 , X 22 ) − SSR(X 1 , X 2 , X 12 )

]