Regression Problem - Linear Regression Analysis | M 374G, Study notes of Mathematics

Material Type: Notes; Class: LINEAR REGRESSION ANALYSIS; Subject: Mathematics; University: University of Texas - Austin; Term: Fall 2004;

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-kdq-1
koofers-user-kdq-1 🇺🇸

4

(1)

8 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
M 374G/384G, Fall 2004
SELECTING TERMS (Supplement to Section 11.5)
Consider a regression problem where E(Y | x) = ηη
ηηTu is the correct model for the
mean function. Often such a model has too many terms to be usable. Can some terms be
deleted without important loss of information?
One problem that might result from dropping terms is that the resulting mean
estimator might be biased. For example, if the correct model is E(Y | x) =η0 + η1u1 + η2u2
+ … ηk-1uk-1 where ηk-1 0 and we fit the model E(Y | x) = γ0 + γ1u1 + γ2u2 + … γk-2uk-2 by
least squares to get fitted values
ˆ
yi
, then (since the least squares estimates are unbiased
for the model used),
E(
ˆ
yi
) = γ0 + γ1ui1 + γ2ui2 + … γk-2ui,k-2,
which might not be the same as
η0 + η1ui1 + η2ui2 + … ηk-1ui,k-1 = E(Y | xi).
The difference between the expected value of the estimate and the parameter being
estimated is called the bias of the estimator:
bias (
ˆ
yi
) = E(
ˆ
yi
) - E(Y | xi)
However, dropping terms might also reduce the variance. Sometimes having
biased estimates is the lesser of two evils. One way to address this problem is to
evaluate the model by a measure that includes both bias and variance. This is the mean
squared error. The mean squared error of a fitted value is the expected value of the
square of the error between the fitted value (for the submodel) and the true conditional
mean at xi:
MSE (
ˆ
yi
) = E([
ˆ
yi
- E(Y | xi)]2).
Please note: Do not confuse with another use of MSE -- to denote RSS/df = Mean Square
for Residuals (on regression ANOVA table)
We would like MSE (
ˆ
yi
) to be small. To understand MSE better, we will examine, for
fixed i, the variance of
ˆ
yi
- E(Y | xi):
pf3
pf4

Partial preview of the text

Download Regression Problem - Linear Regression Analysis | M 374G and more Study notes Mathematics in PDF only on Docsity!

M 374G/384G, Fall 2004

SELECTING TERMS (Supplement to Section 11.5)

ηη ) = x Consider a regression problem where E(Y | ηη

T

is the correct model for the u

mean function. Often such a model has too many terms to be usable. Can some terms be

deleted without important loss of information?

One problem that might result from dropping terms is that the resulting mean

η ) = x model is E(Y | correct estimator might be biased. For example, if the

0

η +

1

u

1

η +

2

u

2

η + …

k-

u

k-

η where

k-

γ ) = x 0 and we fit the model E(Y | ≠

0

γ +

1

u

1

γ +

2

u

2

γ + …

k-

u

k-

by

least squares to get fitted values ˆ

y

i

, then (since the least squares estimates are unbiased

for the model used),

E(

y

i

γ ) =

0

γ +

1

u

i

γ +

2

u

i

γ + …

k-

u

i,k-

,

be the same as not which might

η

0

η +

1

u

i

η +

2

u

i

η + …

k-

u

i,k-

x = E(Y |

i

The difference between the expected value of the estimate and the parameter being

of the estimator: bias estimated is called the

bias ( ˆ

y

i

) = E(

y

i

x ) - E(Y |

i

)

However, dropping terms might also reduce the variance. Sometimes having

biased estimates is the lesser of two evils. One way to address this problem is to

mean evaluate the model by a measure that includes both bias and variance. This is the

The mean squared error of a fitted value is the expected value of the squared error.

square of the error between the fitted value (for the submodel) and the true conditional

x mean at

i

MSE (

y

i

) = E([

y

i

x - E(Y |

i

)]

2

: Do not confuse with another use of MSE -- to denote RSS/df = Mean Square Please note

for Residuals (on regression ANOVA table)

We would like MSE ( ˆ

y

i

) to be small. To understand MSE better, we will examine, for

fixed i, the variance of ˆ

y

i

x - E(Y |

i

):

ˆ Var(

y

i

x - E(Y |

i

= E([

y

i

x - E(Y |

i

)]

2

) - [E(

y

i

x -E(Y |

i

))]

2

= MSE(

y

i

) - [E(

y

i

x ) - E(Y |

i

)]

2

= MSE(

y

i

) - [bias ( ˆ

y

i

)]

2

x Also, since E(Y |

i

) is constant,

Var( ˆ

y

i

x - E(Y |

i

)) = Var( ˆ

y

i

Thus,

MSE(

y

i

) = Var( ˆ

y

i

) + [bias ( ˆ

y

i

)]

2

So MSE really is a combined measure of variance and bias. Now (see Section 10.1.5)

Var(

ˆ

j

) =

σ

2

2

R 1 SU U

j j j

where SU

j

U

j

is defined like SXX, and R

j

2

is the coefficient of multiple determination for

the regression of u

j

on the other terms in the model. Notice that the first factor is

independent of the other terms. Adding a term usually increases R

j

2

; deleting one usually

decreases R

j

2

. Thus adding a term usually increases Var( ˆ

j

); deleting a term usually

decreases Var( ˆ

j

) (i.e., gives a more precise estimate of ˆ

j

). Since ˆ

y

i

is a linear

combination of the ˆ

j

ˆ 's, the effect will be the same for Var(

y

i

Summarizing: Deleting a term typically decreases Var( ˆ

y

i

) but increases bias. So

we want to play these effects off against each other by minimizing MSE ( ˆ

y

i

). But we

total mean squared error need to do this minimization for all i's, so we consider the

J =

i

n

=

1

MSE (

y

i

i

n

=

1

{Var( ˆ

y

i

) + [bias ( ˆ

y

i

)]

2

We want this to be small. Since it's a parameter, we need to estimate it. It works better to

total normed mean squared error estimate the

σ ) = J/Γ (or γ

2

C

I

= k

I

  • (n - k

I

)

σ

σ

I

2

2

  • (n - k

I

)

RSS

I

σ

2

  • 2k

I

  • n.

Thus we can use Mallow's statistic to help identify good candidates for submodels by

looking for submodels where C

I

is both

(i) small (suggesting small total error)

and

k ≤ (ii)

I

(suggesting small bias)

Comments:

  1. Mallow's statistic is provided by many software packages in some model-selection

routine. Arc gives it in both Forward selcetion and Backward elimination. Other software

(e.g., Minitab) may use different procedures for Forward and Backward

selection/elimination, but give Mallow's statistic in another routine.

  1. Since C

I

is a statistic, it will have sampling variability. It might happen, for example,

that C

I

is negative, which would suggest small bias. It also might happen that C

I is larger

than k

I

even when the model is unbiased, but there is no way to distinguish this situation

from a case where there is bias but C

I

γ happens to be less than

I