Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Material Type: Notes; Class: Theoretical Statistics; Subject: Statistics; University: University of California - Berkeley; Term: Spring 2007;
Typology: Study notes
1 / 3
Stat210B: Theoretical Statistics Lecture Date: May 3, 2007
Lecturer: Michael I. Jordan Scribe: Mike Higgins
Oftentimes, we will have a statistic in the form of φn(F ) instead of φ(F ), and we will want to estimate
performance measures in this setting. Examples of this include:
n(
θn − φ(F )) ≤ a)
θn) − φn(F )
nEF (
θn − φn(F ))
2
The basic idea of the bootstrap method is to replace F with
Fn.
Example 1. Suppose λn(F ) = PF (
n(
θn − φ(F )) ≤ a). Replace F with
Fn throughout, thus
θn becomes
a function of “data” X
∗
1
∗
2
∗
n
sampled from
Fn. So λn(
Fn) = P (^) ˆ Fn
n(
θ
∗
n
− φ(
Fn)) ≤ a).
Example 2 (U-Statistic). Let
θn =
2
n(n−1)
i<j
ψ(Xi, Xj ). We have shown that λn(F ) =
4(n−2)
n− 1
γ
2
1
2
n− 1
γ
2
2
where γ
2
1
= E(ψ(X 1
2
)ψ(X 1
3
)) and γ
2
2
= E(ψ(X 1
2
2 ), and so, λ n
(F ) → λ(F ) = 4γ
2
1
. On the other
hand, we have that λn(
Fn) =
4(n−2)
n− 1
γ
∗ 2
1
2
n− 1
γ
∗ 2
2
where γ
∗ 2
1
1
n
3
i
j
k
ψ(Xi, Xj )ψ(Xi, Xk) and
γ
∗ 2
2
1
n
2
i
j
ψ(X i
j
2
. Let γ
2
3
= E(ψ(X i
i
2 ). If we have that γ
∗ 2
1
, γ
∗ 2
2
, and γ
2
3
are all finite, then we
have consistency; λ n
n
) → λ(F ) = 4γ
2
1
. However, we will show that if γ
2
3
= ∞, we may not have consistency.
Let X i
be i.i.d. Uniform(0, 1) variables, and define ψ so that when i 6 = j, |ψ(X i
j
)| ≤ M for some
real number M < ∞, and ψ(Xi, Xi) = exp(
1
Xi
). For divergence of λn(
Fn), we need P (
1
n
2
i
e
1
X i (^) > A) → 1
for all A > 0. Since
i
e
1
Xi ≥ max i
e
1
Xi , we can prove divergence by showing P (max i
e
1
Xi ≤ An
2 ) =
(
P (e
1
X 1 ≤ An
2 )
n
→ 0. To show this, note P (e
1
Xi ≤ An
2 ) = P (X i
1
log(An
2 )
1
log(An
2 )
. Since
1
log(An
2 )
1 √
n
for sufficiently large n, and (1 −
1 √
n
n → 0, it follows that P (maxi e
1
X i (^) ≤ An
2 ) → 0, and we
have divergence of the bootstrap estimator.
Suppose λn(F )
d
−→ λ, which is independent of F. We can use λ as an approximation to λn(F ), or we can
use λ n
n
). If we suppose λ n
(F ) = λ +
α(F )
n
− 1 ), where α is a coefficient depending on the distribution,
then λn(
Fn) = λ +
α(
ˆ F )
n
− 1 ). Additionally, if we suppose that
n(α(
Fn) − α(F )) is tight, then we have
α(
Fn) = α(F ) + op(1), and so, λn(
Fn) = λn(F ) + op(n
− 1 ) This is better than our Op(n
− 1 ) result obtained
from using λ.
2 Lecture 29: Continuation of Bootstrap Discussion
If, on the other hand, λ is not independent of F , we get λn(
Fn) = λ +
α(
ˆ F )
n
− 1 ), which implies
λn(
Fn) − λn(F ) = λ(
Fn) − λ(F ) +
1
n
(α(
Fn) − α(F )) + o(n
− 1 ) = O(n
− 1 ) since λ(
Fn) − λ(F ) is O(n
− 1 ).
Example 3. Suppose φ(F ) = σ
2
. Then φ(
n
1
n
i
i
n
2 =: M 2
, where M i
is the ith central sample
moment.
(F ) = Var(
nM 2
) = (μ 4
− μ
2
2
2(μ 4 −μ
2
2
)
n
μ 4 − 3 μ
2
2
n
2
, where μ i
is the ith central moment. The
classical estimator is λ(
n
4
2
2
), but the bootstrap estimator is λ n
n
4
2
2
2(M 4 −M
2
2
)
n
M 4 − 3 M
2
2
n
2.^ For both estimators, the error is (M 4 −^ M^
2
2
) − (μ 4
− μ
2
2
) + O(n
− 1 ), which is
O(n
−
1
(^2) ) because M i =^ μi +^ O(n
−
1
(^2) ).
n− 1
n
σ
2 , and let λ n
(F ) be the bias of M 2
, that is, λ n
n− 1
n
σ
2 − σ
σ
2
n
. We
have λ n
(F ) → λ = 0, which is independent of F , and so, it is possible that the bootstrap estimator
will converge faster than the classical estimator. We will now show that this is the case. Note that
the bootstrap estimator λ n
n
1
n
2
1
n
(σ
2
1
(^2) )), which implies λ n
n
) − λ n
(F ) = O(n
−
3
(^2) ),
which beats the O(n
− 1 ) rate of the classical estimator!
Define a root Rn(Xn, θ(P )) as a quantity that can be inverted to obtain a confidence interval. The classical
example of a root is Rn(Xn − θ(P )) =
ˆ θn−θ(P )
sn
, where sn is some estimate of the standard deviation. To
obtain confidence intervals based on R n
, we need the distribution of R n
, which we will call λ n
(P ). That is,
λ n
(P, t) = P (R n
n
, θ(P )) ≤ t). The simplest case occurs when λ n
is independent of P , in which case, we
call R n
is called a pivot.
Example 4. Suppose Xi
i.i.d.
∼ N (θ, σ
2 ). Then λn =
¯ X−θ
sn/
√
n
∼ tn− 1 , which is independent of θ and σ
2
. In
this instance, λ n
is a pivot.
In general, if R n
is a pivot, and there is a t such that P
ˆ θn−θ(P )
sn/
√
n
∣ ≤^ t
= 1−α for all P , then
θ n
− t
sn √
n
θ n
sn √
n
is a (1 − α) confidence interval for θ(P ) independent of P.
In the case of the bootstrap, we approximate λn(P ) by λn(
Pn), and we consider the set Bn(1 − α, Xn) :=
{θ ∈ Θ : λ
− 1
n
α
2
Pn) ≤ Rn(Xn, θ) ≤ λ
− 1
n
α
2
Pn)}. We can use a Monte Carlo method to estimate
λ
− 1
n
Pn).
Lemma 5. (van der Vaart, 1998, Lemma 23.3): Assume
θn−θ
ˆσn
d
−→ T and
θ
∗
n
−
ˆ θn
σ
∗ n
d
−→ T. Then the bootstrap
confidence intervals are asymptotically consistent.
Theorem 6 (Sample means). (van der Vaart, 1998, Theorem 23.4): Suppose X i
are i.i.d. with E(X i
) = μ
and Cov(X i
j
) = Σ. Then, conditionally on X 1
2
n
n(
∗
n
n
d
−→ N (0, Σ) for almost every
sequence X 1
2
Theorem 7 (Delta method for bootstrap). (van der Vaart, 1998, Theorem 23.5): Let φ be differentiable in
a neighborhood of θ, let
θn
a.s.
−→ θ, and let
n(
θn − θ)
d
−→ T ,
n(θ
∗
n
θn)
d
−→ T. Then
n(φ(θn) − φ(θ))
d
−→
φ
′
θ
(T ) and
n(φ(θ
∗
n
) − φ(
θn))
d
−→ φ
′
θ
(T ) conditionally almost surely.