Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Model Adequacy Checking - Applied Regression Analysis - Lecture Notes, Study notes of Mathematical Statistics

Anna University Mathematical Statistics

These are the important key points of lecture notes of Applied Regression Analysis are: Model Adequacy Checking, Residual Analysis, Residual Plots, Detection and Treatment of Outliers, Press Statistic, Testing for Lack of Fit, Major Assumptions, Regression Analysis, Zero Mean, Normally Distributed

Typology: Study notes

2012/2013

Uploaded on 01/11/2013

m-alam 🇮🇳

4.7

(12)

54 documents

1 / 17

This page cannot be seen from the preview

Don't miss anything!

Chapter 4: Model Adequacy Checking

In this chapter, we discuss some introductory aspect of model adequacy checking,

including:

• Residual Analysis,

• Residual plots,

• Detection and treatment of outliers,

• The PRESS statistic

• Testing for lack of fit.

The major assumptions that we have made in regression analysis are:

• The relationship between the response

and the regressors is linear, at least

approximately.

• The error term

has zero mean.

• The error term

has constant variance

• The errors are uncorrelated.

• The errors are normally distributed.

Assumptions 4 and 5 together imply that the errors are independent. Recall that

assumption 5 is required for hypothesis testing and interval estimation.

Docsity.com

Discover Study notes of Mathematical Statistics Anna University

Partial preview of the text

Download Model Adequacy Checking - Applied Regression Analysis - Lecture Notes and more Study notes Mathematical Statistics in PDF only on Docsity!

Chapter 4: Model Adequacy Checking

In this chapter, we discuss some introductory aspect of model adequacy checking, including:

Residual Analysis,
Residual plots,
Detection and treatment of outliers,
The PRESS statistic
Testing for lack of fit.

The major assumptions that we have made in regression analysis are:

The relationship between the response Y and the regressors is linear, at least approximately.
The error term ε has zero mean.
The error term ε has constant variance σ 2 .
The errors are uncorrelated.
The errors are normally distributed.

Assumptions 4 and 5 together imply that the errors are independent. Recall that assumption 5 is required for hypothesis testing and interval estimation.

Residual Analysis: The residuals e 1 , e 2 ,L , en have the following important properties:

(a) The mean of ei is 0.

(b) The estimate of population variance computed from the n residuals is:

( )

MS

SS

e e e

i i

i n p n p n p

i Re

1 Re

2 1

2 2 = −

∑ −^ ∑

= = σ

(c) Since the sum of is zero, they are not independent. However, if the number of residuals ( ) is large relative to the number of parameters (

ei

n p ), the dependency effect can be ignored in an analysis of residuals.

Standardized Residual: The quantity

MS

e

d

i i Re

= , i = 1 , 2 ,L, n , is called

standardized residual. The standardized residuals have mean zero and approximately unit

variance. A large standardized residual ( d i > 3 ) potentially indicates an outlier.

Recall that

e =^ (^ I − H ) Y^ =(^ I − H )(^ X^ β^ + ε )^ =( I^ − H ) ε

Therefore,

Var ( e ) = [( (^) I − H )ε ] =( I − H ) (ε )( (^) I − H ) =σ ( I − H )

/ 2 var var.

R-student Residual: The quantity ( 1 ) 2

S ( ) h

e

r

i ii

i i −

−

, i = 1 , 2 ,L, n , is called the R-

student residual or jackknife residuals, where the quantity is the residual variance

computed with the i th observation removed. It can be shown that

S i

2 (−)

Re 2 ( ) − −

n p

h

e

MS

S

i s i

If the usual assumptions in regression analysis are met, the jackknife residual follows exactly a t -distribution with n − p − 1 degrees of freedom.

Example 1: Consider the following data:

x 1 x 2

y ,

X X X

( )

−

X^ X

( )

−

H X X X X

H

⇒ h 11 = 0. 9252 , h 22 = 0. 3832 , h 33 = 0. 7030 , h 44 = 0. 6096 , h 55 = 0. 3790

( 5 3 ) 6. 97 0.^84

11 10. 9252

2 1 Re 2 ( 1 )

2

− −

− =^ −

n p

h

e

MS

S

( 5 3 ) 6. 97 (^0.^45 )

22 10. 3832

2 2 Re 2 ( 2 )

= − −

− =^ −

n p

h

e

MS

S

( 5 3 ) 6. 97 0.^16

33 10. 7030

2 3 Re 2 ( 3 )

2

− −

− =^ −

n p

h

e

MS

S

( 5 3 ) 6. 97 2.^26

44 10. 6096

2 44 Re 2 ( 4 )

2

− −

− =^ −

n p

h

e

MS

S

( 5 3 ) 6. 97 (^2.^81 )

55 10. 3790

2 55 Re 2 ( 5 )

= − −

− =^ −

n p

h

e

MS

S

( )

−

2 ( 5 )

2 ( 4 )

2 ( 3 )

2 ( 2 )

2 ( 1 )

( 5 )

( 4 )

( 3 )

( 2 )

( 1 )

S h

SAS Output: Residuals, Studentized Residuals and R-student Residuals

Obs Residuals student Rstudent

1 0.84112 1.16423 1.

2 -0.44860 -0.21618 -0.

3 0.15888 0.11034 0.

4 2.26168 1.36988 3.

5 -2.81308 -1.35107 -3.

Scat t er pl ot of X2 ver sus X 1

1 2 3 4 5 6

(b) Plot of Residuals versus the Fitted values: A plot of the residuals (or the

scaled residuals

ei

d i ,^ ti or^ ) versus the corresponding fitted values^ is

useful for detecting several common types of model inadequacies.

r i y^ i

If the plot of residuals versus the fitted values can be contained in a horizontal band, then there are no obvious model defects.

The outward-opening funnel pattern implies that the variance of ε is an increasing

function of Y. An inward-opening funnel indicates that the variance of ε decrease as

increases. The double-bow often occurs when Y is a proportion between zero and one. The usual approach for dealing with inequality of variance is to apply a suitable transformation to either the regressor or the response variable.

Y

A curved plot indicates nonlinearity. This could mean that other regressor variables are needed in the model. For example a squared term may be necessary. Transformation on the regressor and/or the response variable may be helpful in these cases.

A plot of residuals versus the predicted values may also reveal one or more unusually large residuals. These points are potential residuals. Extreme predicted value with large residual could also indicate either the variance is not constant or the true relationship between Y and X is not linear. These possibilities should be investigated before the points are considered outliers.

(c) Plot of Residuals versus the Regressors: Plotting the residuals versus corresponding values of each regressor variable can also be helpful. Once again a horizontal band containing the residuals is desirable. The funnel and double-bow patterns indicate nonconstant variance. The curved band or a nonlinear pattern in general indicates that the assumed relationship between Y

and the regressor X j is not correct. Thus, either higher-order terms in X j

(such as X j ) or a transformation should be considered.

Note that in the simple linear regression it is not necessary to plot residuals versus both predicted values and the regressor variable since the predicted values are linear combinations of the regressor values.

(d) Plot of Residuals in Time sequence: It is a good idea to plot the residuals against time order, if the time sequence in which the data were collected is known. If a horizontal band will enclose all of the residuals and the residuals will fluctuate in a more or less random fashion within this band, then there are no autocorrelation.

(f) Partial Residual plots: Suppose that the model contains the regressor

X 1 ,^ X 2 ,L^ , Xk. The partial residuals for regressor X j are^ defined

as (^) e ( (^) x ) (^) e xij i j i j Y β

, i = 1 , 2 ,L, n where the e are the residuals from

the model with all k regressors included. The partial residuals are plotted versus and the interpretation of the partial residual plot is very similar to that of the partial regression plot.

x ij

Example 2 (Delivery Time Data): A soft drink bottler is analyzing the vending machine service routes in his distribution system. He is interested in predicting the amount of time required by the route driver to service the vending machines in an outlet. This service activity includes stocking the machine with beverage products and minor maintenance or housekeeping. The industrial engineer responsible for the study has suggested that the two most important variables affecting the deliver time ( Y ) are the number of cases of

product stocked ( X 1 ) and the distance walked by the route driver ( X 2 ). The engineer

has collected 40 observations on deliver time.

SAS Output:

y = 2. 4123 +1. 6392 x1 +0. 0136 x N 20 Rsq

9525 Adj Rsq
9469 RMSE
7303

Regr essi on Model Y on X1 and X 2

CDF of RSTUDENT

0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0. 7 0. 8 0. 9 1. 0

Q- Q- pl ot of Rst udent Resi dual s

2 - 1 0 1 2
2
1

Nor mal Quant i l es

y = 2. 4123 +1. 6392 x1 +0. 0136 x

N 20 Rsq

9525 Adj Rsq
9469 RMSE
7303

Regr essi on Model Y on X1 and X 2

Pr edi ct ed Val ue

0 10 20 30 40 50 60 70 80

Par t i al Resi dual pl ot s

pr 1

0 10 20 30

Par t i al Resi dual pl ot s

pr 2

0 200 400 600 800 1000 1200 1400 1600

PRESS Statistic: PRESS residuals are defined by ei yi y i

= − (−), where y i

(− )is the predicted value of the i th observed response based on a fit to the remaining sample points. Large PRESS residuals are potentially useful in identifying observations where the model does not fit the data well or observation for which the model is likely to provide poor future predictions. The PREES Statistic is defined by

n − 1

∑ −^

= =

i h

e

y y

i PRESS (^) i i 1

PRESS is generally regarded as a measure of how well a regression model will perform in predicting new data. One very important of the PRESS statistic is in comparing regression models. Generally, a model with a small value of PRESS is desired. The

PRESS statistic can be also used to compute an R -like statistic for prediction, say

SS

R

ediction

PRESS

2 Pr

This statistic gives some indication of the predictive capability of the regression model.

Example 2 (Cont.):

2 = − = − =

SS

R

REs

2 Pr = − = − =

SS

R

ediction

PRESS

Therefore, we could expect this model to “explain” about 89.03% of the variation in predicting new observations, as compared to approximately 95.25% of the variability in the original data explained by the least-squares fit.

Lack of Fit of the Regression Model:

Model Adequacy Checking - Applied Regression Analysis - Lecture Notes, Study notes of Mathematical Statistics

Related documents

Partial preview of the text

Download Model Adequacy Checking - Applied Regression Analysis - Lecture Notes and more Study notes Mathematical Statistics in PDF only on Docsity!

Chapter 4: Model Adequacy Checking

Residual Analysis: The residuals e 1 , e 2 ,L , en have the following important properties:

(a) The mean of ei is 0.

MS

SS

e e e

∑ −^ ∑

ei

MS

e

d

variance. A large standardized residual ( d i > 3 ) potentially indicates an outlier.

S ( ) h

e

r

S i

h

e

MS

S

x 1 x 2

y ,

X X X

X^ X

H X X X X

H

⇒ h 11 = 0. 9252 , h 22 = 0. 3832 , h 33 = 0. 7030 , h 44 = 0. 6096 , h 55 = 0. 3790

( 5 3 ) 6. 97 0.^84

2

− =^ −

h

e

MS

S

( 5 3 ) 6. 97 (^0.^45 )

− =^ −

h

e

MS

S

( 5 3 ) 6. 97 0.^16

2

− =^ −

h

e

MS

S

( 5 3 ) 6. 97 2.^26

2

− =^ −

h

e

MS

S

( 5 3 ) 6. 97 (^2.^81 )

− =^ −

h

e

MS

S

ei

d i ,^ ti or^ ) versus the corresponding fitted values^ is

r i y^ i

The outward-opening funnel pattern implies that the variance of ε is an increasing

function of Y. An inward-opening funnel indicates that the variance of ε decrease as

Y

and the regressor X j is not correct. Thus, either higher-order terms in X j

(such as X j ) or a transformation should be considered.

X 1 ,^ X 2 ,L^ , Xk. The partial residuals for regressor X j are^ defined

, i = 1 , 2 ,L, n where the e are the residuals from

x ij

product stocked ( X 1 ) and the distance walked by the route driver ( X 2 ). The engineer

PRESS Statistic: PRESS residuals are defined by ei yi y i

= − (−), where y i

∑ −^

i h