Statistical Learning - Assignment 1 | STAT 542, Assignments of Statistics

Material Type: Assignment; Professor: Liang; Class: Statistical Learning; Subject: Statistics; University: University of Illinois - Urbana-Champaign; Term: Unknown 1989;

Typology: Assignments

Pre 2010

Uploaded on 03/16/2009

koofers-user-vx5
koofers-user-vx5 🇺🇸

10 documents

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 542: HOMEWORK 1
Problem 1
Since yiN(xt
iβ, σ2), i (1 : n), the log-likelihood is:
l(ˆ
β) = nlog 1
σ2πPn
i=1(yixiˆ
β)2
2σ2.
then,
AIC =2l(ˆ
β)+2p
=2nlog 1
σ2π+Pn
i=1(yixiˆ
β)2
σ2+ 2p
=1
σ2(constant +RSS + 2σ2p).
Given Cp=RSS + 2σ2pand σis known, when AIC is minimized, so does Cp, and
vice versa. So, AI C criterion and Mallow’s Cpare equivalent.
Problem 2
(a) By the definition of ˆ
β, we have:
ˆ
β= arg min
β(y
1xt
1β)2+
n
X
j=2
(yjxt
jβ)2.
In order to prove ˆ
β=ˆ
β[1], we just need to show:
ˆ
β[1] = arg min
β(y
1xt
1β)2+
n
X
j=2
(yjxt
jβ)2.
As known,
ˆ
β[1] = arg min
β
n
X
j=2
(yjxt
jβ)2.
1
pf3

Partial preview of the text

Download Statistical Learning - Assignment 1 | STAT 542 and more Assignments Statistics in PDF only on Docsity!

STAT 542: HOMEWORK 1

Problem 1

Since yi ∼ N(x

t

i

β, σ

2 ), i ∈ (1 : n), the log-likelihood is:

l(

β) = n log

σ

2 π

n

i=

(y i

− x i

β)

2

2 σ

2

then,

AIC = − 2 l(

β) + 2p

= − 2 n log

σ

2 π

n

i=

(yi − xi

β)

2

σ

2

  • 2p

σ

2

(constant + RSS + 2σ

2

p).

Given Cp = RSS + 2σ

2 p and σ is known, when AIC is minimized, so does Cp, and

vice versa. So, AIC criterion and Mallow’s C p

are equivalent.

Problem 2

(a) By the definition of

β

∗ , we have:

β

= arg min

β

(y

1

− x

t

1

β)

2

n ∑

j=

(y j

− x

t

j

β)

2

.

In order to prove

β

β [1]

, we just need to show:

β [1]

= arg min

β

(y

1

− x

t

1

β)

2

n ∑

j=

(y j

− x

t

j

β)

2 .

As known,

β [1]

= arg min

β

n ∑

j=

(y j

− x

t

j

β)

2

.

Plus,

(y

1

− x

t

1

β [1]

2 = 0.

Then,

β [1]

= arg min

β

(y

1

− x

t

1

β)

2

n ∑

j=

(y j

− x

t

j

β)

2 .

(b) Let Y

∗ = {yˆ [1]

, y 2 , · · · , yn}

T and

Y

∗ = {ˆy

1

, yˆ

2

, · · · , ˆy

n

T

Since

Y

∗ = X

β

∗ , we have

1

= x

t

1

β [1]

= ˆy [1]

Additionally, since the modified data remain the same X as the original, the

projection matrix H keeps the same, that is

Y

∗ = HY

. Then, we have

ˆy

1

= H

11

ˆy [1]

n ∑

j=

H

1 j

y j

Thus,

ˆy [1]

= H

11

yˆ [1]

n ∑

j=

H

1 j

y j

Therefore,

(1 − H

11

)ˆy [1]

n ∑

j=

H

1 j

y j

And finally,

yˆ [1]

n

j=

H

1 j

y j

1 − H 11

(c) By (b), we have

yˆ [1]

n

j=

H

1 j

y j

1 − H

11

And then,

y 1 − ˆy [1]

y 1

− H

11

y 1

n

j=

H

1 j

y j

1 − H

11

y 1

n

j=

H

1 j

y j

1 − H 11

y 1 − x

t

1

β

1 − H 11