




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An introduction to the nonparametric bootstrap method, which is a statistical technique used to estimate population parameters without assuming a specific distribution model. The concept of bootstrapping, the role of the empirical distribution function, and the steps to perform a nonparametric bootstrap analysis. It also includes an example using city population data and r code for implementing the bootstrap method.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Introduction to the Bootstrap
Lecture 8 September 18, 2006
Kate Cowles 374 SH, 335- [email protected]
Resources
3 Review concepts
4
Two classes of statistical methods
The empirical distribution
7 Example for the nonparametric bootstrap: City population data
8
elements of the vector being sampled.
> x <- seq(1:25) > x [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > sample(x, 25) [1] 2 20 3 9 6 8 15 10 23 1 19 25 12 21 14 4 13 24 17 5 11 18 7 22 16 > sample(x, 25, replace = TRUE) [1] 4 6 16 11 21 17 6 12 5 8 15 19 23 16 15 20 18 19 21 5 25 7 8 20 3
Bias correction using the bootstrap
15 Bias correction continued
biaŝ boot =^
^ B∑ b=
θ^ ˆ∗b^ − θˆ
= θˆ∗.^ − θˆ
(^) θˆ∗.^ − θˆ
= 2θˆ − θˆ∗.
16 Percentile method for confidence intervals
CDF̂ (t) ' #(θˆ∗b^ ≤^ t) B
R code for the City Data
Main driver program
> drive.bootstrap function(mdata, mlefunc, bootfunc, B) { mle <- mlefunc(mdata) # compute mle print(c("mle",mle)) boots <- bootfunc(mdata, B) # generate bootstrap samples print("quantiles") print(quantile(boots, c(0.005, 0.025, 0.5, 0.975, 0.995))) meanb <- mean(boots) print(c("mean", meanb)) stderr <- sqrt( var(boots) ) print(c("stderr", stderr)) biasc <- 2 * mle - meanb print("bias-corrected point estimate") biasc }
Function defining computation of θˆ
> meanratio function(mdata) { mean(mdata$x)/mean(mdata$u) }
Function carrying out desired type of bootstrap > nonparboot.ratio function(mydat, B) {
if(ncol(mydat) != 2) { print("input matrix must have 2 columns of numeric data") } else { bootratio <- numeric() n <- nrow(mydat) # number of observations for(i in 1:B) { index1 <- sample(n, replace = T)
boot1 <- mydat[index1, ]
bootratio <- c(bootratio, mean(boot1$x) / mean(boot1$u)) }
19 bootratio } }
20 Function call and results
library(boot) data(city) drive.bootstrap(city, meanratio, nonparboot.ratio, 81) > drive.bootstrap(city, meanratio, nonparboot.ratio, 81) [1] "mle" "1.5203125" [1] "quantiles" 0.5% 2.5% 50% 97.5% 99.5% 1.229048 1.276364 1.523894 2.277978 2. [1] "mean" "1.58087829542998" [1] "stderr" "0.268075889571327" [1] "bias-corrected point estimate" [1] 1.
> drive.bootstrap(city, meanratio, nonparboot.ratio, 1000) [1] "mle" "1.5203125" [1] "quantiles" 0.5% 2.5% 50% 97.5% 99.5% 1.195834 1.254945 1.529401 2.118204 2. [1] "mean" "1.57122730135862" [1] "stderr" "0.239441130366229" [1] "bias-corrected point estimate" [1] 1.
R code for parametric bootstrap for the air conditioning data
> parboot.logexpmean function(mdata, B) {
if(ncol(mdata) != 1) { print("input data must have 1 column of numeric data") } else { mle <- logexpmean( mdata ) bootlogexpmean <- numeric() n <- nrow(mdata) # number of observations for(i in 1:B) { boot1 <- rexp( n, exp( - mle ) )
bootlogexpmean <- c(bootlogexpmean, logexpmean( as.matx(boot1) ) )
} bootlogexpmean } }
> logexpmean function(mdata) { if(ncol(mdata) != 1) { print("input data must have 1 column of numeric data") } else { log( mean(mdata) ) } }
27 > drive.bootstrap function(mdata, mlefunc, bootfunc, B) { mle <- mlefunc(mdata) # compute mle print(c("mle",mle)) boots <- bootfunc(mdata, B) # generate bootstrap samples print("quantiles") print(quantile(boots, c(0.005, 0.025, 0.5, 0.975, 0.995))) meanb <- mean(boots) print(c("mean", meanb)) stderr <- sqrt( var(boots) ) print(c("stderr", stderr)) biasc <- 2 * mle - meanb print("bias-corrected point estimate") print(biasc)
cof <- length(boots[boots <= mle])/B print(c("cof", cof)) z0 <- qnorm(cof) # corresponding normal quantile z.alpha <- qnorm(0.975) p.low.end <- pnorm(2 * z0 - z.alpha) p.high.end <- pnorm(2 * z0 + z.alpha) print(c(p.low.end, p.high.end)) boots <- sort(boots) print("bias corrected C.I.") print(c(boots[B * p.low.end], boots[B * p.high.end])) }
28 > drive.bootstrap2(aircondit, logexpmean, parboot.logexpmean, 100) [1] "mle" "4.68290253452844" [1] "quantiles" 0.5% 2.5% 50% 97.5% 99.5% 3.834368 4.094229 4.710413 5.181885 5. [1] "mean" "4.67889044080901" [1] "stderr" "0.286036072515565" [1] "bias-corrected point estimate" [1] 4. [1] "cof" "0.45" [1] 0.01350800 0. [1] "bias corrected C.I." [1] 3.684830 5. > drive.bootstrap2(aircondit, logexpmean, parboot.logexpmean, 100) [1] "mle" "4.68290253452844" [1] "quantiles" 0.5% 2.5% 50% 97.5% 99.5% 4.046429 4.096043 4.684174 5.134347 5. [1] "mean" "4.66888582075081" [1] "stderr" "0.274983035251789" [1] "bias-corrected point estimate" [1] 4. [1] "cof" "0.5" [1] 0.025 0. [1] "bias corrected C.I." [1] 4.081475 5.
> drive.bootstrap2(aircondit, logexpmean, parboot.logexpmean, 1000) [1] "mle" "4.68290253452844" [1] "quantiles" 0.5% 2.5% 50% 97.5% 99.5% 3.760392 3.997390 4.640964 5.166151 5.
[1] "mean" "4.62143415974039" [1] "stderr" "0.29366260048462" [1] "bias-corrected point estimate" [1] 4. [1] "cof" "0.563" [1] 0.05021169 0. [1] "bias corrected C.I." [1] 4.113786 5.
> drive.bootstrap2(aircondit, logexpmean, parboot.logexpmean, 25000) [1] "mle" "4.68290253452844" [1] "quantiles" 0.5% 2.5% 50% 97.5% 99.5% 3.814595 4.033918 4.654331 5.180473 5. [1] "mean" "4.64203157705189" [1] "stderr" "0.292952308908569" [1] "bias-corrected point estimate" [1] 4. [1] "cof" "0.53676" [1] 0.03791469 0. [1] "bias corrected C.I." [1] 4.098616 5.
> drive.bootstrap2(aircondit, logexpmean, parboot.logexpmean, 25000) [1] "mle" "4.68290253452844" [1] "quantiles" 0.5% 2.5% 50% 97.5% 99.5% 3.795942 4.014882 4.654935 5.179212 5. [1] "mean" "4.64014458348023" [1] "stderr" "0.295191848042946" [1] "bias-corrected point estimate" [1] 4. [1] "cof" "0.53592"
[1] 0.03756714 0. [1] "bias corrected C.I." [1] 4.082338 5.
31 Bias-correcting bootstrap confidence intervals
=
#{θˆb^ ≤ q} B