

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Assignment; Professor: Liang; Class: Statistical Learning; Subject: Statistics; University: University of Illinois - Urbana-Champaign; Term: Fall 2008;
Typology: Assignments
1 / 2
This page cannot be seen from the preview
Don't miss anything!


STAT 542 Fall 2008
Due Tuesday, September 16
2 known.
Show that the AIC criterion and Mallow’s Cp are equivalent. Recall that
log likelihood evaluated at MLE
dim of the model
p
= residual sum of squares + 2σ
2
dim of the model
, x 1
),... , (y n
, x n
where yi ∈ R and xi ∈ R
p
. The least squares estimate of the p-dimensional regression
coefficient is given by
β = arg min β∈R
p
n ∑
j=
(y j
− x
t
j
β)
2 .
A basic tool for examining the fit is the leave-one-out residual: consider fitting the model
omitting the ith observation with the corresponding LS estimate defined as
β [i]
= arg min
n ∑
j:j 6 =i
(y j
− x
t
j
β)
2 ,
get a prediction for the omitted observation, ˆy [i]
= x
t
i
β [i]
, then the leave-one-out residual
for the ith observation is (y i
− ˆy [i]
We will show that
y i
− yˆ [i]
y i
− x
t
i
β
ii
where H ii
denotes the (i, i)th entry of the projection matrix H. That is, the leave-one-
out residual is a re-scale of the original residual, hence it is not necessary to re-fit the
model each time an observation is omitted.
We will prove this result via the following steps. Throughout, we assume all the design
matrices are of full rank, and all the LS estimates exist and are unique.
(a) Consider a new data set of n observations where y
∗
1
= ˆy [1]
= x
t
1
β [1]
and the others
are the same as the original data. Let
β
∗ denote the corresponding LS estimate
based on the new data. Show that
β
∗
=
β [1]
Hint: Show that
β [1]
minimizes (y
∗
1
− x
t
1
β)
2
n
j=
(y j
− x
t
j
β)
2 .
(b) Show that
yˆ [1]
n
j=
1 j
y j
(c) Show that
y 1
− ˆy [1]
y 1 − x
t
1
β
(a) Load the data into R, Splus or MATLAB. Do the following transformations on some
of the variables.
∗
1
= log(X 1 ), X
∗
2
∗
3
= log X 3 , X
∗
5
= log(X 5 ), X
∗
6
= log(X 6 ),
∗
7
7
4 , X
∗
8
= log(X 8
∗
9
= log(X 9
∗
10
= log(X 10
11
= e
3 , X
∗
12
12
∗
13
13
∗
14
= log(X 14
(b) Fit a multiple linear regression model to predict the last variable (log of the median
value of owner-occupied homes in $1000’s) from the other variables. Summarize
your analysis.
(c) Search through all the models and select the best sub-model using C p
, AIC and
BIC. Compare the results.
(d) Use the Zheng-Loh model selection method and compare to (c).
(e) Report the 10-fold cross-validation errors for the full model, and the stepwise search
procedure using AIC and BIC.
In addition to your answers, please hand in a printout of necessary code.