


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Class: LINEAR REGRESSION ANALYSIS; Subject: Mathematics; University: University of Texas - Austin; Term: Fall 2004;
Typology: Study notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!



M 374G/384G, Fall 2004
SELECTING TERMS (Supplement to Section 11.5)
ηη ) = x Consider a regression problem where E(Y | ηη
T
is the correct model for the u
mean function. Often such a model has too many terms to be usable. Can some terms be
deleted without important loss of information?
One problem that might result from dropping terms is that the resulting mean
η ) = x model is E(Y | correct estimator might be biased. For example, if the
0
η +
1
u
1
η +
2
u
2
η + …
k-
u
k-
η where
k-
γ ) = x 0 and we fit the model E(Y | ≠
0
γ +
1
u
1
γ +
2
u
2
γ + …
k-
u
k-
by
least squares to get fitted values ˆ
y
i
, then (since the least squares estimates are unbiased
for the model used),
y
i
γ ) =
0
γ +
1
u
i
γ +
2
u
i
γ + …
k-
u
i,k-
,
be the same as not which might
η
0
η +
1
u
i
η +
2
u
i
η + …
k-
u
i,k-
x = E(Y |
i
The difference between the expected value of the estimate and the parameter being
of the estimator: bias estimated is called the
bias ( ˆ
y
i
y
i
x ) - E(Y |
i
)
However, dropping terms might also reduce the variance. Sometimes having
biased estimates is the lesser of two evils. One way to address this problem is to
mean evaluate the model by a measure that includes both bias and variance. This is the
The mean squared error of a fitted value is the expected value of the squared error.
square of the error between the fitted value (for the submodel) and the true conditional
x mean at
i
y
i
y
i
x - E(Y |
i
2
: Do not confuse with another use of MSE -- to denote RSS/df = Mean Square Please note
for Residuals (on regression ANOVA table)
We would like MSE ( ˆ
y
i
) to be small. To understand MSE better, we will examine, for
fixed i, the variance of ˆ
y
i
x - E(Y |
i
):
ˆ Var(
y
i
x - E(Y |
i
y
i
x - E(Y |
i
2
y
i
x -E(Y |
i
2
y
i
y
i
x ) - E(Y |
i
)]
2
y
i
) - [bias ( ˆ
y
i
2
x Also, since E(Y |
i
) is constant,
Var( ˆ
y
i
x - E(Y |
i
)) = Var( ˆ
y
i
Thus,
y
i
) = Var( ˆ
y
i
) + [bias ( ˆ
y
i
2
So MSE really is a combined measure of variance and bias. Now (see Section 10.1.5)
Var(
ˆ
j
) =
σ
2
2
j j j
where SU
j
U
j
is defined like SXX, and R
j
2
is the coefficient of multiple determination for
the regression of u
j
on the other terms in the model. Notice that the first factor is
independent of the other terms. Adding a term usually increases R
j
2
; deleting one usually
decreases R
j
2
. Thus adding a term usually increases Var( ˆ
j
); deleting a term usually
decreases Var( ˆ
j
) (i.e., gives a more precise estimate of ˆ
j
). Since ˆ
y
i
is a linear
combination of the ˆ
j
ˆ 's, the effect will be the same for Var(
y
i
Summarizing: Deleting a term typically decreases Var( ˆ
y
i
) but increases bias. So
we want to play these effects off against each other by minimizing MSE ( ˆ
y
i
). But we
total mean squared error need to do this minimization for all i's, so we consider the
i
n
=
1
y
i
i
n
=
1
{Var( ˆ
y
i
) + [bias ( ˆ
y
i
2
We want this to be small. Since it's a parameter, we need to estimate it. It works better to
total normed mean squared error estimate the
σ ) = J/Γ (or γ
2
I
= k
I
I
)
σ
σ
I
2
2
I
)
I
σ
2
I
Thus we can use Mallow's statistic to help identify good candidates for submodels by
looking for submodels where C
I
is both
(i) small (suggesting small total error)
and
k ≤ (ii)
I
(suggesting small bias)
Comments:
routine. Arc gives it in both Forward selcetion and Backward elimination. Other software
(e.g., Minitab) may use different procedures for Forward and Backward
selection/elimination, but give Mallow's statistic in another routine.
I
is a statistic, it will have sampling variability. It might happen, for example,
that C
I
is negative, which would suggest small bias. It also might happen that C
I is larger
than k
I
even when the model is unbiased, but there is no way to distinguish this situation
from a case where there is bias but C
I
γ happens to be less than
I