Introduction to Resampling Methods: Lecture 5 - Bootstrap, Lecture notes of Statistics

A part of the lecture notes for STAT/Q SCI 403: Introduction to Resampling Methods, specifically Lecture 5, which focuses on the Bootstrap method. the concept of the error and confidence interval of the sample median, the Empirical Bootstrap approach, and the relationship between the CDF of a bootstrap median and the true median. It also mentions the Parametric Bootstrap and its application to estimating the variance and mean square error of sample statistics.

Typology: Lecture notes

2021/2022

Uploaded on 07/05/2022

tanya_go
tanya_go 🇦🇺

4.7

(73)

1K documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT/Q SCI 403: Introduction to Resampling Methods Spring 2017
Lecture 5: Bootstrap
Instructor: Yen-Chi Chen
Question 1: error of sample median? We start with a simple example: what is the error of sample
median? Like sample mean is an estimate of the mean of population, the sample median is an estimate of
the median of population. Because it is an estimator, we can define the bias, variance, and mean square
error (MSE) of sample median. But what are these quantities?
Question 2: confidence interval of sample median? Moreover, how can we construct a confidence
interval for the population median? We know that given a random sample X1,· ·· , XnF, a 1αconfidence
interval of population mean is
¯
Xn±z1α/2·bσn
n,
where ¯
Xnand bσnare the sample mean and sample standard deviation. Can we do the same thing (construct
a confidence interval) for the median?
In this lecture, we will address these problems for median and many other statistics using the well-known
approach: the bootstrap.
5.1 Empirical Bootstrap
Here is how we can estimate the error of sample median and construct the corresponding confidence interval.
Assume we are given the data points X1,··· , Xn. Let Mn=median{X1,·· · ,Xn}. First, we sample with
replacement from these npoints, leading to a set of new observations denoted as X(1)
1,·· · , X(1)
n.Again,
we repeat the sample procedure again, generating a new sample from the original dataset X1,·· · ,Xnby
sampling with replacement, leading to another new sets of observations X(2)
1,·· · , X(2)
n.Now we keep
repeating the same process of generating new sets of observations, after Brounds, we will obtain
X(1)
1,·· · , X(1)
n
X(2)
1,·· · , X(2)
n
.
.
..
.
..
.
.
X(B)
1,·· · , X(B)
n.
So totally, we will have Bsets of data points. Each set of the data points, say X(1)
1,·· · , X(1)
n, is called a
bootstrap sample. This sampling approach–sample with replacement from the original dataset–is called the
empirical bootstrap, invented by Bradley Efron (sometimes this approach is also called Efron’s bootstrap or
nonparametric bootstrap)1.
Now for each set of data, we then compute their sample median. This leads to Bsample medians, called
1For more details, check the wikipedia: https://en.wikipedia.org/wiki/Bootstrapping_(statistics)
5-1
pf3
pf4
pf5

Partial preview of the text

Download Introduction to Resampling Methods: Lecture 5 - Bootstrap and more Lecture notes Statistics in PDF only on Docsity!

STAT/Q SCI 403: Introduction to Resampling Methods Spring 2017

Lecture 5: Bootstrap

Instructor: Yen-Chi Chen

Question 1: error of sample median? We start with a simple example: what is the error of sample median? Like sample mean is an estimate of the mean of population, the sample median is an estimate of the median of population. Because it is an estimator, we can define the bias, variance, and mean square error (MSE) of sample median. But what are these quantities?

Question 2: confidence interval of sample median? Moreover, how can we construct a confidence interval for the population median? We know that given a random sample X 1 , · · · , Xn ∼ F , a 1−α confidence interval of population mean is

X¯n ± z 1 −α/ 2 · √̂σn n

where X¯n and ̂σn are the sample mean and sample standard deviation. Can we do the same thing (construct a confidence interval) for the median?

In this lecture, we will address these problems for median and many other statistics using the well-known approach: the bootstrap.

5.1 Empirical Bootstrap

Here is how we can estimate the error of sample median and construct the corresponding confidence interval. Assume we are given the data points X 1 , · · · , Xn. Let Mn = median{X 1 , · · · , Xn}. First, we sample with

replacement from these n points, leading to a set of new observations denoted as X 1 ∗ (1), · · · , X n∗(1). Again, we repeat the sample procedure again, generating a new sample from the original dataset X 1 , · · · , Xn by

sampling with replacement, leading to another new sets of observations X ∗(2) 1 ,^ · · ·^ , X

∗(2) n.^ Now we keep repeating the same process of generating new sets of observations, after B rounds, we will obtain

X 1 ∗ (1), · · · , X n∗(1) X 1 ∗ (2), · · · , X n∗(2) .. .

X

∗(B) 1 ,^ · · ·^ , X

∗(B) n.

So totally, we will have B sets of data points. Each set of the data points, say X 1 ∗ (1), · · · , X∗ n(1) , is called a bootstrap sample. This sampling approach–sample with replacement from the original dataset–is called the empirical bootstrap, invented by Bradley Efron (sometimes this approach is also called Efron’s bootstrap or nonparametric bootstrap)^1.

Now for each set of data, we then compute their sample median. This leads to B sample medians, called

(^1) For more details, check the wikipedia: https://en.wikipedia.org/wiki/Bootstrapping_(statistics)

5-2 Lecture 5: Bootstrap

bootstrap medians:

M (^) n∗(1) = median{X 1 ∗ (1), · · · , X n∗(1) } M (^) n∗(2) = median{X 1 ∗ (2), · · · , X n∗(2) } .. . M (^) n∗( B)= median{X 1 ∗ (B), · · · , X n∗( B)}.

Now here are some real cool things.

  • Bootstrap estimate of the variance. We will use the sample variance of M (^) n∗(1) , · · · , M (^) n∗( B)as an estimate of the variance of sample median Mn. Namely, we will use

̂ VarB (Mn) = 1 B − 1

∑^ B

`=

M (^) n∗( `)− M¯ (^) B∗

, M¯ B∗ =

B

∑^ B

`=

M (^) n∗( `),

as an estimate of Var(Mn).

  • Bootstrap estimate of the MSE. Moreover, we can estimate the MSE by

MSÊ (Mn) =^1 B

∑^ B

`=

M (^) n∗( `)− Mn

  • Bootstrap confidence interval. In addition, we can construct a 1 − α confidence interval of the population median via Mn ± z 1 −α/ 2 ·

VarB (Mn).

Well... this sounds a bit weird–we generate new data points by sampling from the existing data points. However, under some conditions, this approach does work! And here is a brief explanation on why this approach works.

Let X 1 , · · · , Xn ∼ F. Recall from Lecture 1, a statistic S(X 1 , · · · , Xn) is a function of random variables so its distribution will depend on the CDF F and the sample size n. Thus, the distribution of median Mn, denoted as FMn , will also be determined by the CDF F and sample size n. Namely, we may write the CDF of median as FMn (x) = Ψ(x; F, n), (5.1)

where Ψ is some complicated function that depends on CDF of each observation F and the sample size n.

When we sample with replace from X 1 , · · · , Xn, what is the distribution we are sampling from? Let F̂n(x) = 1 n

∑n i=1 I(Xi^ ≤^ x) be the EDF of these data points. The EDF is a step functions that jumps at each data point. We know that for a discrete random variable, each jump point in its CDF corresponds to the possible value of this random variable and the size of the jump is the probability of selecting that value.

Therefore, if we generate a random variable Z from F̂n, then Z has the following probability distribution:

P (Z = Xi) =

n

, for each i = 1, 2 , · · · , n.

If we generated IID Z 1 , · · · , Zn ∼ F̂n, then the distribution of each Z` is

P (Z` = Xi) =

n

, for each i = 1, 2 , · · · , n, and for all ` = 1, · · · , n.

5-4 Lecture 5: Bootstrap

Failure of the bootstrap. However, the bootstrap may fail for some statistics. One example is the minimum value of a distribution. Here is an illustration why the bootstrap fails. Let X 1 , · · · , Xn ∼ Uni[0, 1] and Mn = min{X 1 , · · · , Xn} be the minimum value of the sample. Then it is known that

n · Mn D → Exp(1).

♠ : Think about why it converges to exponential distribution.

Thus, Mn has a continuous distribution. Assume we generate a bootstrap sample X 1 ∗ , · · · , X n∗ from the original observations. Now let M (^) n∗ = min{X 1 ∗ , · · · , X∗ n} be the minimum value of a bootstrap sample. Because each X `∗ has an equal probability ( (^) n^1 ) of selecting each of X 1 , · · · , Xn, this implies

P (X `∗ = Mn) =

n

Namely, for each observation in the bootstrap sample, we have a probability of 1/n selecting the minimum value of the original sample. Thus, the probability that we do not select Mn in the bootstrap sample is

P (none of X 1 ∗ , · · · , X∗ n select Mn) =

n

)n ≈ e−^1.

This implies that with a probability 1 − e−^1 , one of the observation in the bootstrap sample will select the minimum value of the original sample Mn. Namely,

P (M (^) n∗ = Mn) = 1 − e−^1.

Thus, M (^) n∗ has a huge probability mass at the value Mn, meaning that the distribution of M (^) n∗ will not be close to an exponential distribution.

5.2 Parametric Bootstrap

When we assume the data is from a parametric model (e.g., from Normal distribution, exponential distribu- tion, ...etc), we can use the parametric bootstrap to access the uncertainty (variance, mean square errors, confidence intervals) of the estimated parameter. Here is an illustration using the variance of a normal distribution.

Example: normal distribution. Let X 1 , · · · , Xn ∼ N (0, σ^2 ), where σ^2 is an unknown number. A natural way to estimate σ^2 is via the sample variance S n^2 = (^) n−^11

∑n i=1(Xi^ −^ X¯n) (^2). Because the sample variance is

an estimator, it is a random quantity. How do we estimate the variance of the sample variance? How do we estimate the MSE of the sample variance? How do we construct a 1 − α confidence interval for σ^23?

Here is what we are going to do. Because we know that the sample variance is a good estimator of σ^2 , we can use it to replace σ^2 , leading to a new distribution N (0, S^2 n). We know to sample from this new distribution, so we just generate bootstrap samples from this distribution. Assume we generate B sets of samples:

X 1 ∗ (1), · · · , X∗ n(1) ∼ N (0, S n^2 ) X 1 ∗ (2), · · · , X∗ n(2) ∼ N (0, S n^2 ) .. . X 1 ∗ (B), · · · , X n∗( B)∼ N (0, S n^2 ). (^3) Some of you might have learned an approach via inverting the χ (^2) distribution. That is a viable approach as well.

Lecture 5: Bootstrap 5-

To estimate the variability of S n^2 , we use the sample variance of each bootstrap sample. Let S n^2 ∗ (1), · · · , S n^2 ∗(B)

be the sample variance of each bootstrap sample (S^2 n∗ ()is the sample variance of X∗ 1 (), · · · , X∗ n( `)).

We then use

̂ VarB (S n^2 ) = 1 B − 1

∑^ B

`=

S n^2 ∗ (`)− S¯ B^2 ∗

, S¯^2 B∗ =

B

∑^ B

`=

S n^2 ∗ (`),

as an estimator of the variance of the original sample variance, i.e., Var(S^2 n). Similarly, the MSE can be estimated by

MSÊ B (S^2 n) =^1 B

∑^ B

`=

S n^2 ∗ (`)− S n^2

And a confidence interval of σ^2 can be constructed using

S^2 n ± z 1 −α/ 2 ·

VarB (S n^2 ).

This approach, sampling from the distribution formed by plugging the estimated parameters, is called para- metric bootstrap.

Example: exponential distribution. The similar approach applies to many other models. For example, if we assume the data X 1 , · · · , Xn ∼ Exp(λ), where λ is an unknown quantity. And we estimate λ by an estimator such as the MLE ̂λn = (^) X¯^1 n. To assess the quality of ̂λn, say the its MSE, we first generate B bootstrap samples:

X∗ 1 (1) , · · · , X n∗(1) ∼ Exp(̂λn) X∗ 1 (2) , · · · , X n∗(2) ∼ Exp(̂λn) .. . X 1 ∗ (B), · · · , X n∗( B)∼ Exp(̂λn).

Then using each sample, we obtain a bootstrap estimate of λ:

̂ λ∗ n(1) , · · · , ̂λ∗ n( B), where ̂λ∗ n( )= 1 X¯∗ n()

X∗ 1 ()+···+X∗ n() n

Then the MSE of ̂λn can be estimated by

MSÊ B (̂ λn) =^1 B

∑^ B

`=

λ∗ n( `)− ̂λn

5.3 ♦ :Remark on the Bootstrap

There are some variants of bootstrap such as the Jackknife^4 (leave one observation out each time) and subsampling (only subsample m out of n sample).

Moreover, when the data are dependent such as the time series dataset or spatial dataset, the bootstrap can also be applied; in these case, one will use the block bootstrap^5 or spatial bootstrap.

(^4) https://en.wikipedia.org/wiki/Jackknife_resampling (^5) https://en.wikipedia.org/wiki/Bootstrapping_(statistics)#Block_bootstrap