Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Continuation of Bootstrap Discussion | STAT 210A, Study notes of Data Analysis & Statistical Methods

Material Type: Notes; Class: Theoretical Statistics; Subject: Statistics; University: University of California - Berkeley; Term: Spring 2007;

Typology: Study notes

Pre 2010

Uploaded on 09/07/2009

koofers-user-6gt
koofers-user-6gt 🇺🇸

10 documents

Partial preview of the text

Download Continuation of Bootstrap Discussion | STAT 210A and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Stat210B: Theoretical Statistics Lecture Date: May 3, 2007

Lecture 29: Continuation of Bootstrap Discussion

Lecturer: Michael I. Jordan Scribe: Mike Higgins

1 Theory of Bootstrap

Oftentimes, we will have a statistic in the form of φn(F ) instead of φ(F ), and we will want to estimate

performance measures in this setting. Examples of this include:

  • CDF: λn(F ) = PF (

n(

θn − φ(F )) ≤ a)

  • Bias: λn(F ) = EF (

θn) − φn(F )

  • Variance: λn(F ) =

nEF (

θn − φn(F ))

2

The basic idea of the bootstrap method is to replace F with

Fn.

Example 1. Suppose λn(F ) = PF (

n(

θn − φ(F )) ≤ a). Replace F with

Fn throughout, thus

θn becomes

a function of “data” X

1

, X

2

,... , X

n

sampled from

Fn. So λn(

Fn) = P (^) ˆ Fn

n(

θ

n

− φ(

Fn)) ≤ a).

Example 2 (U-Statistic). Let

θn =

2

n(n−1)

i<j

ψ(Xi, Xj ). We have shown that λn(F ) =

4(n−2)

n− 1

γ

2

1

2

n− 1

γ

2

2

where γ

2

1

= E(ψ(X 1

, X

2

)ψ(X 1

, X

3

)) and γ

2

2

= E(ψ(X 1

, X

2

2 ), and so, λ n

(F ) → λ(F ) = 4γ

2

1

. On the other

hand, we have that λn(

Fn) =

4(n−2)

n− 1

γ

∗ 2

1

2

n− 1

γ

∗ 2

2

where γ

∗ 2

1

1

n

3

i

j

k

ψ(Xi, Xj )ψ(Xi, Xk) and

γ

∗ 2

2

1

n

2

i

j

ψ(X i

, X

j

2

. Let γ

2

3

= E(ψ(X i

, X

i

2 ). If we have that γ

∗ 2

1

, γ

∗ 2

2

, and γ

2

3

are all finite, then we

have consistency; λ n

F

n

) → λ(F ) = 4γ

2

1

. However, we will show that if γ

2

3

= ∞, we may not have consistency.

Let X i

be i.i.d. Uniform(0, 1) variables, and define ψ so that when i 6 = j, |ψ(X i

, X

j

)| ≤ M for some

real number M < ∞, and ψ(Xi, Xi) = exp(

1

Xi

). For divergence of λn(

Fn), we need P (

1

n

2

i

e

1

X i (^) > A) → 1

for all A > 0. Since

i

e

1

Xi ≥ max i

e

1

Xi , we can prove divergence by showing P (max i

e

1

Xi ≤ An

2 ) =

(

P (e

1

X 1 ≤ An

2 )

n

→ 0. To show this, note P (e

1

Xi ≤ An

2 ) = P (X i

1

log(An

2 )

1

log(An

2 )

. Since

1

log(An

2 )

1 √

n

for sufficiently large n, and (1 −

1 √

n

n → 0, it follows that P (maxi e

1

X i (^) ≤ An

2 ) → 0, and we

have divergence of the bootstrap estimator.

1.1 Comparing weak convergence-based approximations and boostrap.

Suppose λn(F )

d

−→ λ, which is independent of F. We can use λ as an approximation to λn(F ), or we can

use λ n

F

n

). If we suppose λ n

(F ) = λ +

α(F )

n

  • o(n

− 1 ), where α is a coefficient depending on the distribution,

then λn(

Fn) = λ +

α(

ˆ F )

n

  • o(n

− 1 ). Additionally, if we suppose that

n(α(

Fn) − α(F )) is tight, then we have

α(

Fn) = α(F ) + op(1), and so, λn(

Fn) = λn(F ) + op(n

− 1 ) This is better than our Op(n

− 1 ) result obtained

from using λ.

2 Lecture 29: Continuation of Bootstrap Discussion

If, on the other hand, λ is not independent of F , we get λn(

Fn) = λ +

α(

ˆ F )

n

  • o(n

− 1 ), which implies

λn(

Fn) − λn(F ) = λ(

Fn) − λ(F ) +

1

n

(α(

Fn) − α(F )) + o(n

− 1 ) = O(n

− 1 ) since λ(

Fn) − λ(F ) is O(n

− 1 ).

Example 3. Suppose φ(F ) = σ

2

. Then φ(

F

n

1

n

i

(X

i

X

n

2 =: M 2

, where M i

is the ith central sample

moment.

  1. Let λ n

(F ) = Var(

nM 2

) = (μ 4

− μ

2

2

2(μ 4 −μ

2

2

)

n

μ 4 − 3 μ

2

2

n

2

, where μ i

is the ith central moment. The

classical estimator is λ(

F

n

) = (M

4

− M

2

2

), but the bootstrap estimator is λ n

F

n

) = (M

4

− M

2

2

2(M 4 −M

2

2

)

n

M 4 − 3 M

2

2

n

2.^ For both estimators, the error is (M 4 −^ M^

2

2

) − (μ 4

− μ

2

2

) + O(n

− 1 ), which is

O(n

1

(^2) ) because M i =^ μi +^ O(n

1

(^2) ).

  1. Note that E(M 2

n− 1

n

σ

2 , and let λ n

(F ) be the bias of M 2

, that is, λ n

(F ) =

n− 1

n

σ

2 − σ

2

σ

2

n

. We

have λ n

(F ) → λ = 0, which is independent of F , and so, it is possible that the bootstrap estimator

will converge faster than the classical estimator. We will now show that this is the case. Note that

the bootstrap estimator λ n

F

n

1

n

M

2

1

n

2

  • O(n

1

(^2) )), which implies λ n

F

n

) − λ n

(F ) = O(n

3

(^2) ),

which beats the O(n

− 1 ) rate of the classical estimator!

1.2 Bootstrap Confidence Intervals

Define a root Rn(Xn, θ(P )) as a quantity that can be inverted to obtain a confidence interval. The classical

example of a root is Rn(Xn − θ(P )) =

ˆ θn−θ(P )

sn

, where sn is some estimate of the standard deviation. To

obtain confidence intervals based on R n

, we need the distribution of R n

, which we will call λ n

(P ). That is,

λ n

(P, t) = P (R n

(X

n

, θ(P )) ≤ t). The simplest case occurs when λ n

is independent of P , in which case, we

call R n

is called a pivot.

Example 4. Suppose Xi

i.i.d.

∼ N (θ, σ

2 ). Then λn =

¯ X−θ

sn/

n

∼ tn− 1 , which is independent of θ and σ

2

. In

this instance, λ n

is a pivot.

In general, if R n

is a pivot, and there is a t such that P

ˆ θn−θ(P )

sn/

n

∣ ≤^ t

= 1−α for all P , then

θ n

− t

sn √

n

θ n

  • t

sn √

n

is a (1 − α) confidence interval for θ(P ) independent of P.

In the case of the bootstrap, we approximate λn(P ) by λn(

Pn), and we consider the set Bn(1 − α, Xn) :=

{θ ∈ Θ : λ

− 1

n

α

2

Pn) ≤ Rn(Xn, θ) ≤ λ

− 1

n

α

2

Pn)}. We can use a Monte Carlo method to estimate

λ

− 1

n

Pn).

Lemma 5. (van der Vaart, 1998, Lemma 23.3): Assume

θn−θ

ˆσn

d

−→ T and

θ

n

ˆ θn

σ

∗ n

d

−→ T. Then the bootstrap

confidence intervals are asymptotically consistent.

Theorem 6 (Sample means). (van der Vaart, 1998, Theorem 23.4): Suppose X i

are i.i.d. with E(X i

) = μ

and Cov(X i

, X

j

) = Σ. Then, conditionally on X 1

, X

2

,... , X

n

n(

X

n

X

n

d

−→ N (0, Σ) for almost every

sequence X 1

, X

2

Theorem 7 (Delta method for bootstrap). (van der Vaart, 1998, Theorem 23.5): Let φ be differentiable in

a neighborhood of θ, let

θn

a.s.

−→ θ, and let

n(

θn − θ)

d

−→ T ,

n(θ

n

θn)

d

−→ T. Then

n(φ(θn) − φ(θ))

d

−→

φ

θ

(T ) and

n(φ(θ

n

) − φ(

θn))

d

−→ φ

θ

(T ) conditionally almost surely.