Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Homoscedasticity and Independence of Residuals in Linear Regression, Exercises of Design

Design

The properties of residuals in simple linear regression, focusing on their homoscedasticity and independence. The document also explores the impact of monotonic heteroscedasticity on residuals and the use of transformed residuals for testing against alternative hypotheses. The paper was originally presented at The Georgia Institute of Technology in 1967.

Typology: Exercises

2021/2022

Uploaded on 08/05/2022

aichlinn 🇮🇪

4.4

(46)

1.9K documents

1 / 17

This page cannot be seen from the preview

Don't miss anything!

INDEPENDENT

STEPWISE

RESIDUALS

FOR

TESTING

HOMOSCEDASTICITy.!/

A. Hedayat and D. s. RobsonS/

Cornell

University

ABSTRACT

Regression

models

which

specify

independent,

homoscedastic

and nor.mally

distributed

errors

may

be

analyzed

in

a

stepwise

manner

to

produce

calculated

residuals

having

this

same

property.

If

the

n'th

residual

is

calculated

as

the

deviation

of

the

n'th

observation

from

its

predicted

value

based

on a

least

squares

fit

to

only

the

first

n

observations

then

the

resulting

sequence

of

residuals,

appropriately

nor.malized,

are

not

only

mutually

independent

and

homoscedastic

but

also

are

independent

of

all

of

the

calculated

regression

~

functions.

If

error

variance

is

a monotonic

function

of

the

mean

then,

under

certain

regularity

condition~,

the

calculated

stepv1ise

residuals

are

likewise

monotonically

heteroscedastic.

Simple

linear

regression

with

equally

spaced

values

of

the

independent

variable

constitutes

one

such

regular

case,

and

a

Monte

Carlo

study

of

the

"peak-test"

of

homoscedasticity

in

this

instance

shm1s

that

for

small

samples

the

stepwise

residuals

are

substantially

more

sensitive

to

monotonic

heteroscedasticity

than

conventional,

untransformed

residuals.

This

pape.:~

was

ori.f,:lnally

presentct:".

under

the

titls

"Eo:1·x:eed.:otEticj:'v~!

::'

...

n

L

4near

De

"J"l'"'

.-e-.

nv

1\ ··o.;

-r·''

"'

w~

\-11

T'

~,,ally

s.,.,.,

•'e"' Xi

"'I!

0"'

0

~

"'1. 1

,,

-,

ct.:..'7

""-.

_.

.1.\

b~-'··lo):-

.....

~.:.1

r"L.:·

-~jr-·

..

..~,,;:;,

.....

v

...

,'_;.J,...J..,_ -

;;lc

...

.._

l...i.

'·

0:

.!.J.

.t1.

_

_i:J-

__

.,..)

-"'-,)-'1

u.

...

the

onE:

h\:r~1·'l:_·

;cl :::':mrteen+.:r;

;::J.c:G~ing

of

-~!1·2

:nst-:.t1.~te

of.'

1~a:thema:~ical

Statis~~.cs

..

at

t};.e Gco:rgia

IXlctHute

of

T.::c:'l;•.clog;l:

Atla~::ca,

Georgia.

5:/

Researct

cuppor~ed

in

part

by

Grant

Number

GB-4502

f::..·cm

the

National

Science

Foundation.

-~-

/

'i"

lo

(-_

-~-)

Partial preview of the text

Download Homoscedasticity and Independence of Residuals in Linear Regression and more Exercises Design in PDF only on Docsity!

INDEPENDENT STEPWISE RESIDUALS FOR TESTING HOMOSCEDASTICITy.!/

A. Hedayat and D. s. RobsonS/ Cornell University

ABSTRACT

Regression models which specify independent, homoscedastic and nor.mally distributed errors may be analyzed in a stepwise manner to produce calculated residuals having this same property. If the n'th residual is calculated as the deviation of the n'th observation from its predicted value based on a least squares fit to only the first n observations then the resulting sequence of residuals, appropriately nor.malized, are not only mutually independent and homoscedastic but also are independent of all of the calculated regression ~ functions. If error variance is a monotonic function of the mean then, under certain regularity condition~, the calculated stepv1ise residuals are likewise monotonically heteroscedastic. Simple linear regression with equally spaced values of the independent variable constitutes one such regular case, and a Monte Carlo study of the "peak-test" of homoscedasticity in this instance shm1s that for small samples the stepwise residuals are substantially more sensitive to monotonic heteroscedasticity than conventional, untransformed residuals.

This pape.:~ was ori.f,:lnally presentct:". under the titls "Eo:1·x:eed.:otEticj:'v~! ::'...n

L4near_. De.1.\ b~-'··lo):-"J"l'"'^ .-e-. .....^ nv~.:.1^ r"L.:· 1^ ··o.; -~jr-·^ -r·'' .. ..~,,;:;,^ "'^ w~.....^ -11v ... T' ,';.J,...J..,^ ~,,ally - s.,.,.,;;lc ...^ •'e"'.._ l...i.^ Xi '·^ "'I!0:^ 0"'.!.J.^ .t1.^0 ~i:J-^ "'1. 1 ^ .,..) ,,^ -, -"'-,)-'1^ ct.:..'7^ ""-.u. ...

the onE: h:r~1·'l:_· ;cl :::':mrteen+.:r; ;::J.c:G~ing of -~!1·2 :nst-:.t1.~te of.' 1~a:thema:~ical Statis~~.cs .. at t};.e Gco:rgia IXlctHute of T.::c:'l;•.clog;l: Atla~::ca, Georgia. 5:/ Researct cuppor~ed in part by Grant Number GB-4502 f::..·cm the National Science Foundation.

/ 'i" lo (-_^ -~--~-)

INDEPENDENT STEPWISE RESIDUALS FOR TESTING HOMOSCEDASTICITyJ/ A. Hedayat and D~ s. RobsonSf Cornell. University

1. INTRODUCTION AND SUMMARY

Consider the fixed effects general linear model Y=Xf3+€ (1)

where Y is an N-vector of responses, X is an NXP matrix with rank r ~ p having either fixed known coefficients or coefficients that are stochastically inde- pendent of the error ter.m 1 f3 is a p-vector of unknown parameters, € is an N- vector of unknown stochastic components with mean zero and is usually called the error (residual or disturbance) vector. Linear models dealt with in practice usually include in their basic structure the assumption that the covariance matrix of € is a 2 IN where a 2 is a scalar and IN denotes the identity matrix of order N. Specifically it is often assumed that €- N(O,a 2 IN). A diagnosis of the validity of these conditions imposed on the linear model residuals is impeded by the fact that under the usual hypothesis of independent and identically distributed errors, deviations from the least squares fit are neither independent nor, in general, identically distributed. Calculated residuals e = Y - Xf3 are linear functions of the true errors € = Y - ~ at the ! points of the experimental design and are subject to linear constraints equal

This paper was originally presented under the title "Homoscedasticity in Linear Regression Analysis with Equally Spaced X' s" 1 on April 4, 1967 at The one hundred fourteenth meeting of the Institute of Mathematical Statistics, At the Georgia Institute of Technology, Atlanta, Georgia. Research supported in part by Grant Number GB-4502 from the National Science Foundation. Paper No. BU-135 in the Biometrics Unit and No. 524 in the Department of Plant Breeding and Biometry, Cornell University, Ithaca, N.Y. -1-

residuals and their expected values with respect to any specified heteroscedastic model, or the "peak-test" of heteroscedasticity as developed by Goldfeld and Quandt [1 1 2]. For further discussion of the use of transformed residuals see [5] 1 [6], [7], and [8]. As shown in Section 21 an uncorrelated set of residuals retaining essentially the same intuitive appeal as the original residuals may be obtained by a stepwise fitting of the linear model to successively more observations. Thus,if xn~(n) is the predicted value of Y n calculated by fitting only the first n observations

Y1, ••• 1 Yn to the linear model Y- X~ + € then, excluding all n for which

x (^) n~ ( (^) n ) =Yn (^) 1 the residuals in the sequence

are linearly uncorrelated if the components of € are uncorrelated and homoscedastic. The degenerate case Y n =x (^) n~( n) arises when inclusion of the n'th observation in- creases t^ he rank of the design mart^ ix^ (by^ unJ..^ ty)i.~^ Nor.ma1' J.ZJ.ng^.^ scalars en = cr~ I crf are known constants and the residuals d n = c fn n are then linearly uncorrelated with common variance cr (^2) y•x^ = cr€^2 1 and if the €-distribution is normal then so is the d-distribution.

*If the r degeneracies occur at Y 1 , ••• ,y (^) r then f (^) n could be defined as Yn - x~(n-l) for n > r in order to give more weight to €n ; in any other case, hm-1ever (^) 1 the f (^) n so defined would depend on ~ as well as € •

n

The set of numbers ( d } n obtained in this manner depends upon the ordering imposed on the set of N observations; for a given set of N observations there are n!/ d possible sets ( d } n • The choice of a particular set, again, ''Till depend upon the statistician's objective in analyzing residuals. Fortunately, this choice may be made to depend upon calculated values of the regression functions, X~(n)' for any n, without affecting the probability distribution of (d (^) n} under the homo- scedastic nonmal hypothesis. Since residuals are statistically independent of estimated regression functions then in constructing a set of residuals dr+l' ••• (^) 1 ~ to test for monotonic heteroscedasticity, for example, ~may be chosen as the A normalized residual associated with the largest of the N predicted values X~~ Similarly, YN-l may be defined as the observed Y at the design point corresponding to the second largest of X~, and so on. In the simplest and, in the present context, degenerate case where Y 1 , •• &,YN are assumed to be identically distributed, say Y. ~ =a+ E.,~ the N predicted values X~ are identically a. For any given ordering of the observations, specified by same external consideration, the sequence d 2 , ••• ,~ becomes the Helmert statistics as employed by Hogg, for example, in his heuristic method of iterated tests for equality of means [4]. We note that his iterative scheme may in general be applied to the sequence of test statistics

to test the sequence of nested hypotheses ••• = ,....2 v (^) €r+2^ , • H• (^) r^ +3.• ,....2v€1 = ••• =

(i-r-l)d~ ~

••• + (^) d.^2 J.- 1

N If~ is true (and the € 1 s are normally distributed) then F 111 , ••• ,F 1 ,k-r-l are

correct ordering of the observations there are design'configurations for which monotonicity of a (^2) y•x^ is not sufficient to guarantee monotonicity of the sequence {a~ }. Counterexamples violating this property are easily constructed with n simple linear regression models; in the important special case of simple linear regression with equally spaced values of x, however, the monotonicity is preserved.

This fact is demonstrated in Section 3.

The utility of independent residuals is illustrated in Section 4 where the "peak-test" developed by Goldfeld and Quandt [1] is applied to simulated residuals from a simple linear regression with equally spaced values of x. This test was devised to detect monotonic trends in a sequence of random variables, and the distribution of the peak-test statistic was tabulated for the case of independent and identically distributed (continuous) random variables. Monte Carlo camputa- tions are given here, comparing the properties of the peak-test applied to ld 3 ,, ••• , 1~1 and applied (as in Goldfeld and Quandt) to the original residuals 1el1,•••' leNt •

ZERO CORRELATION BETWEEN OLD AND NEW RESIDUALS WHEN ADDITIONAL OBSERVATIONS ARE INCORPORATED INTO A LINEAR (MULTIPLE) REGRESSION ANALYSIS. Let us rewrite the model (l) in the following form

l

f3+ l- €(1) ]

where Y(l) and Y( (^) 2) contain n and N-n observations respectively. Now suppose that we ignore the Y( (^) 2) ~bservations and fit only the n observations Y(l); viz.,

where X(l) denotes the transpose of X(l)' the least squares estimate f(l) of

E(l) from model (3) will be

(4) If we now fit the entire N observations to the model, and if G is a generalized inverse of X'X 1 then the least squares est:i.In.ate e( (^) 2) of €( 2) "Ylill be

Note that while f(l) is a function of Y(l) only, e( (^) 2) is a function of both

We now prove the follm-ring theorem THEOREI>1 2 .. 1. (^) f(l) and e( (^) 2) ~linearly uncorrelated (independent)~~ com,c~ents £! € ~ independent and identically (normally) distributed. To prove the theorem we need the following well-known lemma _!.,EMMA 2,.1<) Let W be ~ p x q matrix. ~ if K is ~ generalized inverse of W'W, ~ WI<l.f'W = W • Proof of Theorem. f(l) and e( (^) 2) can be expressed as folloHs: f(l) = Y(l) - x(l)HX(l)Y(l) = (In- x(l)HX(l))€(1) e(2) = Y(2) - x(2)GX(l)Y(l) - x(2)GX(2)Y(2) = (IN-n - X(2)GX(2))€(2) - X(2)GXC1)€(1) • Now if the components of € are independent and identically distributed, then the covariance between f(l) and e( (^) 2) is

THE MONOTOl'liCITY PROPERTY OF THE VARIANCE OF d (^) n IN HErEROSCEDASTIC SIMPLE BEGRESSION MODELS^1 ) •· If the errors €. 1 in the simple linear regression model Y. 1 =a + f3x. 1 + E. 1 are nonnally and independently distributed with mean 0 and variance o Y<>X. 2 =- cr~... then the transformed residuals f 3 , .... ,fN are likewise normal with J. mean 0 and variances

n (^) ( - ,, - ) (^) (xn- x{n}) cJ2 (^) =- '[ l: + xi^ -^ x{EL_t.xn^ -^ x(n)^ l2^ (j~ (^) + [1- 2 2 -,^ CJ f (^) a L 1 (^) n nl: (^) (x. - X'..~ (^) ' '2, ...1 1 n (^) .En (^) ( ,x. (^) - x(n)- )2 J n 1 1 ^ I.l.}^ I l 1

When the error variance c (^) yox^2 is an increasing function of x the condition

x 1 < ••o < x , n which implicf o~.... ~ ••• ~^ cr^ n'^2 is^ not^ sufficient^ to^ ensure^ that

the normalized residuals

will have lncreasing variances~ how8ver, if x. 1 = d + bi, b > o, t~en the follov1ing th~orero obtains: ~ {crfJ ~ !!! }!lcreas:i.ns sequence a

Pr-oof~ The variance a~ in this case becomes n n- L _(6i - 2 - 2n) 2 1 n(n^2 -^ 1)(n^ -^ 2)

0 ~^ +^ in^ -^ 2)(n^ -^ 1) (^1) n(n + 1)

and

n+l

n- I -144i 2 + i(72n + 144) - 4(n + 2)(2n + 5) 0 ~ l^ n(n^2 -^ l)(n^2 -^ 4)^1

n+l

+ (n - 1)(20 - n 2 ) a2 +

n(n + l)(n + 2) n

= L 8 ni 0 ~ (say)^ • i=l

n(n - 1) (^) an+l 2 (n +·l)(n + 2)

Note that i=l E 8 n1. = 01 because of the nor.malization 1 and that for n ~ 5

(^8) nl.. > 0 for

i :::r ·n ·+ 1

and 8ni < 0 1 otherwise (where [!] denotes the integer part of !)• This infor.mation concerning the signs of the (^8) nl.. implies that

k [~] L 8ni s I 8ni for [n;~] < k :11: n i=l i=l where

also satisfies the conditions of the lemma, but in this case

n \ L (^) (a 2 .; ... (^) - a^2 i+l^ )D (^) i""'.,.. (^) - 1 1 in contradiction to the assumption o'a 2 ~ o. Calculation of the numerical values of oni for n = 3 and n = 4 reveals that the conditions of the lemma are also satisfied in these cases. Thus, the mono- tonicity preserving property holds for all n when xi = a + bi, b > 0 1 and the

correct direction of monotonicity is preserved provided only that sgn(~)^ "' = sgn(~).

4. SIMULATION OF THE "PEAK-TEST" OF HOMOSCEDASTICITY IN SIMPLE LINEAR REGRESSION

Goldfeld and Quandt [1] discuss the problem of testing homoscedasticity against a monotone heteroscedastic alternative hypothesis, and present tabulated critical values for a so-called "peak-test" of the residuals. A "peak" residual is said to occur at "'Yj^ =a "'^ + "'~xj,^ "Y^1 < ••• < "'YN'^ if and only if lYi - "'Yi1^ < lYj - "'Yjl for all i < j, and the peak-test statistic is then defined as the number of peaks A A occurring among 1Y2 - Y2 ,, ••• ,1YN- YN1 • Critical values axe obtained from the tabulated distribution of the number of peru~s occurring in a random sample of size N from an absolutely continuous distribution. In their original application of the peak-test to simple linear regression residuals, Goldfeld and Quandt [1] failed to take into account both the dependence which exists between residuals and the fact that the distribution of Y. ~ - Y.J. is a function of x.; J. under the homoscedastic hypothesis the stochastically largest absolute residual occurs with the x. ~ nearest to i. If sample size is large then these shortcomings of their procedure are minor,as the authors later pointed out [2]; however, as sample size increases, the mechanics of performing the peak-test

became unduly time consuming and a computationally simpler procedure such as the F-test described by Goldfeld and Quandt becomes more expedient. For small samples, the peak-test applied to the untransformed residuals is clearly invalid with respect to the size of the test, and also has poor power characteristics.

Table 1 illustrates these points for sample size N ~ 10, and also indicates how they are overcome by applying the peak-test to normalized, stepwise residuals. The cumulative distribution of number of peaks for selected, monotone heteroscedastic alternative hypotheses was estimated by generating 1000 samples of size N = 10 from the standard normal distribution and transforming to heteroscedastic errors by appropriate scale changes. After scale changes the least squares residuals and stepv1ise least squares residuals were then constructed as appropriate linear functions of the errors; each sample of size N = 10 \las thus used in all eight columns of observed values in Table l. The columns labeled "H 0 Nominal c.d.f." and "H 0 EKact c.d.f." were calculated from recursion fonnulae presented by Goldfeld and Quandt for the exact probability distribution of number of peaks in random samples of size 10 and 8, respectively. Note that the homoscedastic exact and observed distribution of pealr.s in normalized step- wise residuals stand in close agreement, as expected, providing a crude guide as to the amount of precision inherent in the other columns of observed probabilities.

The "peak-test" which treats the least squares residuals as if they were independent and identically distributed errs substantially in the size of the test. Thus, taking 4 or more peaks among the 10 residuals as the critical region gives a nominal significance level a 4 = 1 - .9055 = .0945 while the actual size of the test is approximately 1 - .9710 ~ .03; and if any of the three hetero- scedastic models obtained then the probability of rejecting ho.moscedasticity would be at best approximately .05 (less than the nominal size of the test). Applied to normalized stepwise residuals the critical region of 4 or more peaks has size 1 - .9385 = .0615, and the probability of detecting the alternative cr (^) y•x^2 = 2x is approximately 1 - .709 ~ .29o

Since only the errors € were simulated in this Monte Carlo operation the estimated distributions under the heteroscedastic models in Table 1 must be re- garded as estimates of conditional probabilities, the condition being that sgn(~)^ " = sgn(~). With independent, normally distributed heteroscedastic errors

where ~(·) denotes the standard cumulative normal distribution.

ACKNOWLEDGMENT

The authors are grateful to the referees, associate editor, and editor f~r several helpful suggestions.

References

[1] Goldfeld 1 s. M. and Quandt, R. E. "Some tests for homoscedasticity," Journal of the American Statistical Association, 60(1965)539-47. [2] Goldfeld 1 s. ~ and Quandt, R. E. Corrigenda, Journal of the American Statistical Association, 62(1967)1518. [3] Hogg, R. v. "On the resolution of statistical hypotheses," Journal of the American Statistical Association, 56(1961)978-89. [4] Hogg 1 R. v. "Iterated tests of the equality of several distributions," Journal of the American Statistical Association, 57(1962)579-85. [5] Koerts, J. 11 Scme further notes on disturbance estimates in regression analysis," Journal of the American Statistical Association, 62(1967) 169-83. [6] Putter, J. "Orthonormal bases of error spaces and their use for investigat- ing the normality and variances of residuals," Journal of the .American Statistical Association, 62(1967)1022-36. [7] Theil, H. "The analysis of disturbances in regression analysis," Journal of the American Statistical Association, 60(1965)1067-79• [8] Theil, H. "A simplification of the BLUS procedure for analyzing regression disturbances," Journal of the .American Statistical Association, 63(1968) 242-51.

Homoscedasticity and Independence of Residuals in Linear Regression, Exercises of Design

Related documents

Partial preview of the text

Download Homoscedasticity and Independence of Residuals in Linear Regression and more Exercises Design in PDF only on Docsity!

INDEPENDENT STEPWISE RESIDUALS FOR TESTING HOMOSCEDASTICITy.!/

ABSTRACT

L4near_. De.1.\ b~-'··lo):-"J"l'"'^ .-e-. .....^ nv~.:.1^ r"L.:· 1^ ··o.; -~jr-·^ -r·'' .. ..~,,;:;,^ "'^ w~.....^ -11v ... T' ,';.J,...J..,^ ~,,ally - s.,.,.,;;lc ...^ •'e"'.._ l...i.^ Xi '·^ "'I!0:^ 0"'.!.J.^ .t1.^0 ~__i:J-^ "'1. 1 __^ .,..) ,,^ -, -"'-,)-'1^ ct.:..'7^ ""-.u. ...

/ 'i" lo (-_^ -~--~-)

1. INTRODUCTION AND SUMMARY

Y1, ••• 1 Yn to the linear model Y- X~ + € then, excluding all n for which

This fact is demonstrated in Section 3.

l

f3+ l- €(1) ]

E(l) from model (3) will be

x 1 < ••o < x , n which implicf o~.... ~ ••• ~^ cr^ n'^2 is^ not^ sufficient^ to^ ensure^ that

+ (n - 1)(20 - n 2 ) a2 +

Note that i=l E 8 n1. = 01 because of the nor.malization 1 and that for n ~ 5

correct direction of monotonicity is preserved provided only that sgn(~)^ "' = sgn(~).

4. SIMULATION OF THE "PEAK-TEST" OF HOMOSCEDASTICITY IN SIMPLE LINEAR REGRESSION

ACKNOWLEDGMENT

L4near_. De.1.\ b~-'··lo):-"J"l'"'^ .-e-. .....^ nv~.:.1^ r"L.:· 1^ ··o.; -~jr-·^ -r·'' .. ..~,,;:;,^ "'^ w~.....^ -11v ... T' ,';.J,...J..,^ ~,,ally - s.,.,.,;;lc ...^ •'e"'.._ l...i.^ Xi '·^ "'I!0:^ 0"'.!.J.^ .t1.^0 ~i:J-^ "'1. 1 ^ .,..) ,,^ -, -"'-,)-'1^ ct.:..'7^ ""-.u. ...