Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Chapter 11 The Bootstrap, Study notes of Statistics

University of Wollongong (UOW)Statistics

The bootstrap is a method for estimating the variance of an estimator and for finding approximate confidence intervals for parameters.

Typology: Study notes

2021/2022

Uploaded on 07/05/2022

allan.dev 🇦🇺

4.5

(86)

1K documents

1 / 15

This page cannot be seen from the preview

Don't miss anything!

Chapter 11

The Bootstrap

This chapter covers the following topics:

•What is the Bootstrap?

•Why Does it Work?

•Examples of the Bootstrap.

11.1 Introduction

Most of this volume is devoted to parametric inference. In this chapter we depart from

the parametric framework and discuss a nonparametric technique called the bootstrap.

The bootstrap is a method for estimating the variance of an estimator and for finding

approximate confidence intervals for parameters. Although the method is nonparametric,

it can be used for inference about parameters in parametric and nonparametric models

which is why we include it in this volume.

11.2 A More General Notion of “Parameter”

We begin by broadening what we mean by a parameter. Let us begin with a few examples.

1. Let X1,...,X

n⇠Pwhere P2(P✓:✓2⇥). Let b

✓nbe the maximum likelihood

estimator of ✓. We would like to estimate the variance of b

✓nand we want a 1↵

confidence interval for ✓.

209

Discover Study notes of Statistics University of Wollongong (UOW)

Partial preview of the text

Download Chapter 11 The Bootstrap and more Study notes Statistics in PDF only on Docsity!

Chapter 11 The Bootstrap

This chapter covers the following topics:

What is the Bootstrap?
Why Does it Work?
Examples of the Bootstrap.

11.1 Introduction

Most of this volume is devoted to parametric inference. In this chapter we depart from

the parametric framework and discuss a nonparametric technique called the bootstrap.

The bootstrap is a method for estimating the variance of an estimator and for finding

approximate confidence intervals for parameters. Although the method is nonparametric,

it can be used for inference about parameters in parametric and nonparametric models

which is why we include it in this volume.

11.2 A More General Notion of “Parameter”

We begin by broadening what we mean by a parameter. Let us begin with a few examples.

Let X 1

,... , X

⇠ P where P 2 (P ✓

: ✓ 2 ⇥). Let

b ✓ n

be the maximum likelihood

estimator of ✓. We would like to estimate the variance of

b ✓ (^) n and we want a 1 ↵

confidence interval for ✓.

210 CHAPTER 11. THE BOOTSTRAP

Let X 1

,... , X

⇠ P and let ✓ = T (P ) denote the mean of P. Hence, ✓ = E[X i

] =

R

xdP (x). Let

b ✓ n

P

X

. Again, we would like to estimate the variance of

b ✓ n

and we want a 1 ↵ confidence interval for ✓.

Let X 1 ,... , X (^) n ⇠ P and let ✓ = T (P ) denote the median of P. Hence, P(X (^) i  ✓) =

P(X

✓) = 1/ 2. Let

b ✓ n

denote the sample median. Yet again, we would like to

estimate the variance of

b ✓ n

and we want a 1 ↵ confidence interval for ✓.

In the first example, ✓ denotes the parameter of a parametic model. In the second and third

example, we are in a nonparametric situation; in these cases we think of a “parameter” as

a function of the distribution P and we write ✓ = T (P ). The bootstrap can be used in both

the parametric and nonparametric settings.

Let P n

be the empirical distribution. This is the discrete distribution that puts mass 1 /n at

each datapoint X i

. Hence,

P (^) n (A) =

n X

I(X (^) i 2 A). (11.1)

In the nonparametric case, we will estimate the parameter ✓ = T (P ) by

b ✓ (^) n = T (P (^) n ) which

is called the plug-in estimator. For example, when ✓ = T (P ) =

R

xdP (x) is the mean, the

plug-in estmator is

b ✓ n

= T (P

Z

xdP n

(x) =

X

which is the sample mean.

A sample of size n drawn from P n

is called a bootstrap sample, denoted by

X

⇤

,... , X

⇤

⇠ P

Bootstrap samples play an important role in what follows. Note that drawing an iid sample

X

⇤

,... , X

⇤

from P n

is equivalent to drawing n observations, with replacement, from the

original data {X 1

,... , X

}. Thus, bootstrap sampling is often described as “resampling the

data.” This can be a bit confusing and we think it is much clearer to think of a bootstrap

sample X

⇤

,... , X

⇤

as n draws from the empirical distribution P n

11.3 The Bootstrap

Now we give the bootstrap algorithms for estimating the variance of

b ✓ n

and for construct-

ing confidence intervals. The explanation of why (and when) the bootstrap gives valid

estimates, is deferred until Section 11.5. Let

b ✓ n

= g(X 1

,... , X

) denotes some estimator.

212 CHAPTER 11. THE BOOTSTRAP

●

● ●

●

● ● ●

●

0.0 0.5 1.0 1.5 2.

−

Figure 11.1: 50 points drawn from the model Y i

= 1 + 2X

X

where X i

Uniform(0, 2) and ✏ i

⇠ N (0,. 2

2 ). In this case, the maximum of the polynomail occurs at

✓ = 1. The true and estimated curves are shown in the figure. At the bottom of the plot we

show the 95 percent boostrap confidence interval based on B = 1, 000.

Theorem 139. Under appropriate regularity conditions,

P(✓ 2 C (^) n ) = 1 ↵ O

as n! 1.

11.4 Examples

Example 140. Consider the polynomial regression model Y = g(X) + ✏ where X, Y 2 R

and g(x) = 0 + 1 x+ 2 x

. Given data (X 1 , Y 1 ),... , (X (^) n , Yn ) we can estimate = ( 0 , 1 , 2 )

with the least squares estimator

b . Suppose that g(x) is concave and we are interested in

the location at which g(x) is maximized. It is easy to see that the maximum occurs at

x = ✓ where ✓ = (1/2) 1 / 2. A point estimate of ✓ is

b ✓ = (1/2)

b 1 /

b 2. Now we use the

bootstrap to get a confidence interval for ✓. Figure 11.1 shows 50 points drawn from the

above model with 0 = 1 , 1 = 2, 2 = 1. The X (^) i ’s were sample uniformly on [0, 2] and

we took ✏ i

⇠ N (0,. 2

2 ). In this case, ✓ = 1. The true and estimated curves are shown in

the figure. At the bottom of the plot we show the 95 percent boostrap confidence interval

based on B = 1, 000.

11.5. WHY DOES THE BOOTSTRAP WORK? 213

Example 141. Let (X 1

, Y

, Z

),... , (X

, Y

, Z

) ⇠ P where X i

2 R, Y

2 R, Z

2 R

. The

partial correlation of X and Y given Z is

where ⌦ = ⌃

1 and ⌃ is the covariance matrix of W = (X, Y, Z)

. The partial correlation

measures the linear dependence between X and Y after removing the effect of Z. For

illustration, suppose we generate the data as follows: we take Z ⇠ N (0, 1), X = 10Z + ✏

and Y = 10Z + where ✏, ⇠ N (0, 1). The correlation between X and Y is very large. But

the partial correlation is 0. We generated n = 100 data points from this model. The sample

correlation was 0.99. However, the estimate partial correaltion was -0.16 which is much

closer to 0. The 95 percent bootstrap confidence interval is [-.33,.02] which includes the

true value, namely, 0.

11.5 Why Does the Bootstrap Work?

To explain why the bootstrap works, let us begin with a heuristic. Let

F (^) n (t) = P(

b ✓

⇤

b ✓ (^) n )  t)

and let

b F n

(t) = P(

b ✓

⇤

b ✓ n

)  t|X 1

,... , X

be the bootstrap approximation to F n

. We do not know F n

be we do know

b F n

in the

sense that it depends only on the observed data. Usually, F (^) n will be close to some limiting

distribution L. Similarly,

b F n

will be close to some limiting distribution

b L. Moreover, L and

b L will be close which implies that F n

and

b F n

are close. In practice, we usually approximate

b F n

by its Monte Carlo version

F (t) =

B

B X

I(

b ✓

⇤

b ✓ j

)  t).

But F is close to

b F (^) n as long as we take B large. See Figure 11.2.

Now we will give more detail in a simple, special case. Suppose that X 1

,... , X

⇠ P where

X (^) i has mean μ and variance

. Suppose we want to construct a confidence interval for μ.

Let μb n

P

X

and define

F

(t) = P(

n(μb n

μ)  t). (11.3)

11.5. WHY DOES THE BOOTSTRAP WORK? 215

F

b

F

L

b

L

F

O(1/

p

n)

O

p

n)

O

p

n)

O(1/

p

B)

Figure 11.2: The distribution F n

(t) = P(

b ✓ n

✓)  t) is close to some limit distribution

L. Similarly, the bootstrap distribution

b F n

(t) = P(

b ✓

⇤

b ✓ n

)  t|X 1

,... , X

) is close to

some limit distribution

b L. Since

b L and L are close, it follows that F n

and

b F n

are close. In

practice, we approximate

b F n

with its Monte Carlo version F which we can make as close

b F (^) n as we like by taking B large.

216 CHAPTER 11. THE BOOTSTRAP

To prove this result, let us recall that Berry-Esseen Theorem from Chapter 2. For conve-

nience, we repeat the theorem here.

Theorem 143 (Berry-Esseen Theorem). Let X 1

,... , X

be i.i.d. with mean μ and variance

. Let μ 3 = E[|X (^) i μ|

] < 1. Let X (^) n = n

P

X (^) i be the sample mean and let be the

cdf of a N (0, 1) random variable. Let Z n

n(X (^) n μ)

. Then

sup

P(Z (^) n  z) (z)

μ (^3)

Proof of the Bootstrap Theorem. Let

(t) denote the cdf of a Normal with mean 0 and

variance

. Let b

P

(X

bμ n

. Thus, b

= Var(

n(bμ

⇤

bμ n

)|X

,... , X

). Now,

by the triangle inequality,

sup

b F n

(t) F n

(t)|  sup

|F

(t)

(t)| + sup

(t) b

(t)| + sup

b F n

(t) b

(t)|

= I + II + III.

Let Z ⇠ N (0, 1). Then, Z ⇠ N (0,

2 ) and from the Berry-Esseen theorem,

I = sup

|F

(t)

(t)| = sup

P

n(μb n

μ)  t

P (Z  t)

= sup

P

✓p

n(μb n

μ)

P

Z 

μ 3

Using the same argument on the third term, we have that

III = sup

b F n

(t) b

(t)| 

μ b 3

where μb 3

P

|X

bμ n

3 is the empirical third moment. By the strong law of large

numbers, μb 3

converges almost surely to μ 3

. So, almost surely, for all large n, bμ 3

 2 μ 3

and so III 

2 μ (^3) p

. From the fact that b = O P

1 /n) it may be shown that II =

sup t

(t) b

(t)| = O P

1 /n). (This may be seen by Taylor expanding b

(t) around .)

This completes the proof. ⇤

We have shown that sup t

b F (^) n (t) F (^) n (t)| = O (^) P

1 p

. From this, it may be shown that, for

each 0 < < 1 , t

= O

1 p

. From this, one can prove Theorem 139.

So far we have focused on the mean. Similar theorems may be proved for more general

parameters. The details are complex so we will not discuss them here. We give a little more

information in the appendix. For a thorough treatment, we refer the reader to Chapter 23

of van der Vaart (1998).

218 CHAPTER 11. THE BOOTSTRAP

that the distribution of X i

is sub-Gaussian, although this is stronger than needed. This

means that E(e

T X )  e

c||t||

for some c > 0.

Let μ = E[X i

] 2 R

. Here is a bootstrap algorithm for constructing a confidence set for μ.

High Dimensional Bootstrap

Draw a bootstrap sample X

⇤

,... , X

⇤

⇠ P

. Compute μb

⇤

P

X

⇤

Repeat the previous step, B times, yielding estimators bμ

⇤

n, 1

,... , μb

⇤

n,B

b F n

(t) =

B

B X

I(

n||bμ

⇤

n,j

μb n

 t).

C (^) n =

a 2 R

: ||a bμ (^) n || 1 

t (^) ↵

where t (^) ↵ =

b F

(1 ↵).

Output C n

Theorem 144 (Chernozhukov, Chetverikov and Kato, 2014). Suppose that d = o(e

1 / 8

Then

P(μ 2 C n

c log d

1 / 8

for some c > 0.

Under the stated conditions, the same result applies to higher-order moments. If ✓ = g(μ)

for some function g then we can get a confidence set for ✓ by applying g to C n

. We call this

the projected confidence set. That is, if we define A n

= {g(μ) : μ 2 C n

} then it follows that

P(✓ 2 A

c log d

1 / 8

Alternatively, we can apply the bootstrap to

n(g(μb) g(μ)). However, we do not auto-

matically get the same coverage guarantee that the projected set has.

Example 145. Let us consider constructing a confidence set for a high-dimensional covari-

ance matrix. Let X 1

,... , X

2 R

k be a random sample and let ⌃ = Var(X) which is a k ⇥ k

matrix. There are d = O(k

) parameters here. Let

b ⌃ = (1/n)

P

(X (^) i X (^) n )(X (^) i X (^) n )

Also, let = vec(⌃) and b = vec(

b ⌃), where vec takes a matrix and converts it into a vector

by stacking the columns. We can then apply the bootstrap algorithm above to

n(b )

11.8. SUBSAMPLING 219

to get the bootstrap quantile t ↵

. Let ` n

= b t ↵

n and u n

= b + t ↵

n. We can then

unstack ` n

and u n

into k ⇥ k matrices L n

and U n

. It then follows that

P(L

 ⌃  U

c log d

1 / 8

where A  B means that Ajk  B (^) jk for all (j, k).

11.8 Subsampling

11.9 Finite Sample Methods

11.9.1 The Permutation Test

In this section we discuss a nonparametric hypothesis testing method. The test is not based

on the bootstrap but we include it here because it is similar in spirit to the bootstrap. Let

X

,... , X

⇠ F, Y

,... , Y

⇠ G

be two independent samples and suppose we want to test the hypothesis

H 0 : F = G versus H 1 : F 6 = G. (11.7)

The permutation test gives an exact (nonasymptotic), nonparametric method for testing

this hypothesis. Let Z = (X, Y ) where X = (X 1

,... , X

T and Y = (Y 1

,... , Y

. Define a

vector W of length N = n + m that indicates which group Z i

is from. Thus, W i

= 1 if i  n

and W i

= 2 if i > n. The data look like this:

(X, Y )

T X 1

... X

Y

... Y

Z Z 1... Zn Z (^) n+1... Z (^) n+m

W 1... 1 2... 2

Let T = T (Z, W ) be any test statistic. For example, consider T = |X Y |. We can

write T as a function of Z and W as follows. Define X(Z, W ) = {Z i

: W

= 1} and

Y (Z, W ) = {Z (^) i : W (^) i = 2} and then T = |X Y | = |X(Z, W ) Y (Z, W )|.

Let T

⇤ = T (Z, W

⇤ ) where W

⇤ denotes a random permutation of W. Define the permutation

p-value

p = P(T

⇤

t) (11.8)

where t = T (Z, W ) is the observed value of the test statistic. This p-value defines an exact

test. The steps of the algorithm are as follows:

11.10. SUMMARY 221

− 6 − 4 − 2 0 2 4 6

−

− 6 − 4 − 2 0 2 4 6

−

Test Statistics

0.2 0.3 0.4 0.5 0.

100

200

300

400

Figure 11.3: Top left: X 1

,... , X

. Top right: Y 1

,... , Y

. Bottom left: values of the test

statistic from 1,000 permutations.

It would be difficult to find a useful expression for the distribution of the test statistic T

under the null hypothesis H 0

: F = G. However, we can compute the p-value easily using

the permutation test. Figure 11.3 shows an example. The top left plot shows n = 10

observations from F and the top right plot shows n = 10 observations from G. (We took F

to be bivariate normal and G to be a mixture of two normals.) The test statistic is 0.45 and

the p-value, based on B = 1, 000 is 0.006 suggesting that we should reject H 0

. The bottom

left shows a histogram of the values of T from the 1,000 permutations. The vertical line is

the observed value of T. The p-value is the fraction of statistics greater than T.

11.9.2 Confidence Rectangles for Quantiles

11.9.3 Confidence Rectangles for Means

11.9.4 Conformal Methods

11.10 Summary

The bootstrap provides nonparametric standard errors and confidence intervals. To draw

a bootstrap sample we draw n observations X

⇤

,... , X

⇤

from the empirical distribution

P (^) n. This is equivalent to drawing n observations with replacement from the original daa

X

,... , X

. We then compute the estimator

b ✓

⇤ = g(X

⇤

,... , X

⇤

). If we repeat this whole

222 CHAPTER 11. THE BOOTSTRAP

process B times we get

b ✓

⇤

. The standard deviation of these values approximates

the stanard error of

b ✓ n

= g(X 1

,... , X

11.11 Bibliographic Remarks

Further details on statistical functionals can be found in [51], [13], [52], [23] and [59].

The jackknife was invented by [47] and [58]. The bootstrap was invented by [20]. There

are several books on these topics including [22], [13], [29] and [52]. Also, see Section

3.6 of [60].

Appendix

More on Plug-in Estimators. Let ✓ = T (P ). The plug-in estimator of ✓ is

b ✓ n

= T (P

where P n

is the empirical distribution that puts mass 1 /n at each X i

. For example, suppose

that T (P ) =

R

x dP (x) is the mean. Then T (P n

R

x dP n

(x) = n

P

X

since itegrat-

ing with respect to P n

corresponds to summing over the discrete measure with mass 1 /n

at X i

As another example, suppose that ✓ = T (P ) is the variance of X. Let μ denote the mean.

Then

✓ = E(X μ)

Z

(x μ)

dP (x) =

Z

dP (x)

Z

xdP (x)

Thus, the plug-in estimator is

b ✓ n

Z

dP n

(x)

Z

xdP n

(x)

n X

X

n X

X

n X

(X

X

For one more example, let ✓ be the ↵ quantile of X. Here it is convenient to work with the

cdf F n

(x) = P (X  x). Thus ✓ = T (P ) = T (F ) = F

1 (↵) where F

1 (y) = inf x

{F

(x)

y}. The empirical cdf is F n

(x) = n

P

I(X

 x) and

b ✓ n

= T (F

) = inf x

{F

(x) ↵}.

In other words,

b ✓ n

is just the corresponding sample quantile.

Hadamard Differentiability. The key condition needed for the bootstrap is Hadamard

differentiability. Let P denote all distributions on the real line and let D denote the linear

space generated by P. Write T ((1 ✏)P + ✏Q) = T (P + ✏D) where D = Q P 2 D. The

Chapter 11 The Bootstrap, Study notes of Statistics

Related documents

Partial preview of the text

Download Chapter 11 The Bootstrap and more Study notes Statistics in PDF only on Docsity!

Chapter 11

The Bootstrap

11.1 Introduction

11.2 A More General Notion of “Parameter”

,... , X

210 CHAPTER 11. THE BOOTSTRAP

,... , X

] =

R

P

X

P(X

R

= T (P

Z

X

X

X

,... , X

⇠ P

X

,... , X

,... , X

,... , X

11.3 The Bootstrap

,... , X

212 CHAPTER 11. THE BOOTSTRAP

= 1 + 2X

X

⇠ N (0,. 2

11.4 Examples

⇠ N (0,. 2

11.5. WHY DOES THE BOOTSTRAP WORK? 213

, Y

, Z

),... , (X

, Y

, Z

2 R, Y

2 R, Z

2 R

11.5 Why Does the Bootstrap Work?

,... , X

B

I(

,... , X

P

X

F

11.5. WHY DOES THE BOOTSTRAP WORK? 215

F

b

F

L

b

L

F

O(1/

p

n)

O

p

n)

O

p

n)

O(1/

p

B)

,... , X

216 CHAPTER 11. THE BOOTSTRAP

,... , X

P

P

(X

)|X