Understanding the Bootstrap Resampling Method for Statistical Inference | Lecture notes Applied Statistics

A Resampling Method Called the Bootstrap

Monte Carlo and bootstrap methods are both computer intensive methods used frequently is

applied statistics. The bootstrap is a type of Monte Carlo method applied based on observed data

(Efron and Tibshirani 1993, Mooney and Duval 1993). The bootstrap was described by Bradley

Efron (1979) and he has written much about the method and its generalizations since then.

Thousands of papers have been written on the bootstrap in the past 2 decades and it has found very

wide use in applied problems. The bootstrap can be used for several purposes, here we we focus on

robust estimation of sampling variances or standard errors and (asymmetrical) confidence intervals. It

has found use in estimation of model selection frequencies and a variety of other applications.

The bootstrap has enormous potential for the biologist with programming skills; however, its

computer intensive nature will continue to hinder its use. We believe that at least 1,000 bootstrap

reps are needed in many applications. Often 10,000 reps are needed for some aspects of model

selection. In extreme cases, reliable results could take days of computer time to apply the bootstrap

to complex data analysis cases.

The fundamental idea of the model-based sampling theory approach to statistical inference is

that the data arise as a sample from some conceptual probability distribution, . Uncertainties of ourf

inferences can be measured if we can estimate There are ways to construct a nonparametricf.

estimator of (in essence) from the sample data. The most fundamental idea of the bootstrap methodf

is that we compute measures of our inference uncertainty from that estimated sampling distribution of

fwith. However, in practical application, the bootstrap means using some form of resampling

replacement xBxfrom the actual data, , to generate bootstrap samples, Often, the data

• • *.

(sample) consist of independent units and it then suffices to take a simple random sample of size ,

n n

with replacement n , from theunits of data, to get one bootstrap sample (i.e. “rep"). However, the

nature of the correct bootstrap data re-sampling can be more complex for more complex data

structures.

The set of bootstrap samples is a proxy for a set of independent real samples from (inBBf

reality we have only one actual sample of data). Properties expected from replicate real samples are

inferred from the bootstrap samples by analyzing each bootstrap sample exactly as we first analyzed

the real data sample. From the set of results of sample size we measure our inference uncertaintiesB

from sample to (conceptual) population (see figure). The bootstrap can work well for large sample

sizes (), but may not be reliable for small (say 5, 10 or even 20), regardless of how manynn

bootstrap samples, , are used.B

Estimation of the Sampling Variance

In many cases one can derive an estimator of the sampling variance of an estimator from

general likelihood theory. In other cases, an estimator may be difficult to derive or many not exist in

closed form. For example, the finite rate of population change () can be derived from a Leslie-

population projection matrix (a function of age-specific fecundity and age-specific, conditional

survival probabilities). The bootstrap is handy for variance estimation in such cases.

Understanding the Bootstrap Resampling Method for Statistical Inference, Lecture notes of Applied Statistics