Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Nonparametric Bootstrap: Estimating Population Parameters without Assuming a Model - Prof., Study notes of Statistics

University of Iowa (UI)Statistics

Prof. Mary Kathryn Cowles

An introduction to the nonparametric bootstrap method, which is a statistical technique used to estimate population parameters without assuming a specific distribution model. The concept of bootstrapping, the role of the empirical distribution function, and the steps to perform a nonparametric bootstrap analysis. It also includes an example using city population data and r code for implementing the bootstrap method.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-f46 🇺🇸

10 documents

1 / 8

This page cannot be seen from the preview

Don't miss anything!

1

22S:166

Introduction to the Bootstrap

Lecture 8

September 18, 2006

Kate Cowles

374 SH, 335-0727

[email protected]

2

Resources

•Efron, B. (1982) The Jackknife, the Boot-

strap, and Other Resampling Plans. Num-

ber 38 in CBMS-NSF Regional Conference

Series in Applies Mathematics. Philadelphia:

SIAM.

•Efron, B. and Tibshirani, R.J. (1993) An

Introduction to the Bootstrap. New York:

Chapman & Hall.

•Davison, A.c. and Hinkley, D.V. (1997) Boot-

strap Methods and their Application, New

York: Cambridge University Press.

3

Review concepts

•suppose we have one sample of ndata values:

y1,...,yn

•sample values considered outcomes of i.i.d.

random variables Y1,...,Yn

•probability density function (pdf) or proba-

bility mass function (pmf) f

•cumulative distribution function (cdf) F

•sample will be used to make inference

–about population characteristic θ

–using statistic Twhose value in sample is

t

•questions of interest regarding T

–bias?

–standard error?

–quantiles?

–how to compute confidence limits for θ?

4

–likely values under a null hypothesis of in-

terest?

Discover Study notes of Statistics University of Iowa (UI)

Partial preview of the text

Download Nonparametric Bootstrap: Estimating Population Parameters without Assuming a Model - Prof. and more Study notes Statistics in PDF only on Docsity!

22S:

Introduction to the Bootstrap

Lecture 8 September 18, 2006

Kate Cowles 374 SH, 335- [email protected]

Resources

Efron, B. (1982) The Jackknife, the Boot- strap, and Other Resampling Plans. Num- ber 38 in CBMS-NSF Regional Conference Series in Applies Mathematics. Philadelphia: SIAM.
Efron, B. and Tibshirani, R.J. (1993) An Introduction to the Bootstrap. New York: Chapman & Hall.
Davison, A.c. and Hinkley, D.V. (1997) Boot- strap Methods and their Application, New York: Cambridge University Press.

3 Review concepts

suppose we have one sample of n data values: y 1 ,... , yn
sample values considered outcomes of i.i.d. random variables Y 1 ,... , Yn
probability density function (pdf) or proba- bility mass function (pmf) f
cumulative distribution function (cdf) F
sample will be used to make inference
- about population characteristic θ
- using statistic T whose value in sample is t
questions of interest regarding T
- bias?
- standard error?
- quantiles?
- how to compute confidence limits for θ?

4

likely values under a null hypothesis of in- terest?

Two classes of statistical methods

parametric
- particular mathematical model for behav- ior of random variables Yj
- pdf or pmf f is completely determined by values of unknown parameters ψ
- quantity of interest in statistical analysis θ is a component or function of ψ
nonparametric
- uses only the fact the Yjs are i.i.d.
- no mathematical model for their distribu- tion
- (may be useful to do a nonparameteric analysis even if a reasonable parametric model exists) ∗ to assess sensitivity of conclusions to as- sumptions of parametric model

The empirical distribution

puts probability mass (^) n^1 at each sample value yj
empirical distribution function (edf) or Fˆ
- nonparametric mle of F
- sample proportion Fˆ (y) = #{yj n≤y} ∗ where # denotes the number of items in a set
edf plays role of fitted model when no math- ematical form is assumed for F

7 Example for the nonparametric bootstrap: City population data

for each of n = 49 U.S. cities, two data values
- uj = population in 1920 (in 1000s)
- xj = population in 1930 (in 1000s)
population of interest is all U.S. cities
the 49 cities are assumed to be a simple ran- dom sample from this population
define (U,X) as pair of population values for a randomly selected city
then if we knew θ = E E((XU )) and the total 1920 population for the U.S., we could estimate the total 1930 population of U.S.
want to estimate θ without assuming any parametric model for X and U
sample-based statistic is T = XU¯ ¯

8

observations 1 to 10 of this dataset are in- cluded with the boot package for R

elements of the vector being sampled.

> x <- seq(1:25) > x [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 > sample(x, 25) [1] 2 20 3 9 6 8 15 10 23 1 19 25 12 21 14 4 13 24 17 5 11 18 7 22 16 > sample(x, 25, replace = TRUE) [1] 4 6 16 11 21 17 6 12 5 8 15 19 23 16 15 20 18 19 21 5 25 7 8 20 3

Bias correction using the bootstrap

notation
- θ – true and unknown population quantity value
- θˆ – estimate of θ based on sample data
- θˆ∗b^ – estimate of θ from b-th bootstrap sample

15 Bias correction continued

So in a sense:
- θˆ∗s are to θˆ as θˆ is to θ
bootstrap esimate of bias
- Note: bias = EF (θˆ − θ)

biaŝ boot =^

B

 ^ B∑ b=

θ^ ˆ∗b^ − θˆ

 

= θˆ∗.^ − θˆ

So bias-corrected point estimate is θ˜ = θˆ −

  (^) θˆ∗.^ − θˆ

 

= 2θˆ − θˆ∗.

16 Percentile method for confidence intervals

denote cdf of bootstrap distribution of θˆ∗^ as CDF̂ (t) = P r ∗(θˆ∗^ ≤^ t)
If bootstrap distribution is obtained by sim- ulation then

CDF̂ (t) ' #(θˆ∗b^ ≤^ t) B

define confidence interval as interval between appropriate quantiles

R code for the City Data

Main driver program

> drive.bootstrap function(mdata, mlefunc, bootfunc, B) { mle <- mlefunc(mdata) # compute mle print(c("mle",mle)) boots <- bootfunc(mdata, B) # generate bootstrap samples print("quantiles") print(quantile(boots, c(0.005, 0.025, 0.5, 0.975, 0.995))) meanb <- mean(boots) print(c("mean", meanb)) stderr <- sqrt( var(boots) ) print(c("stderr", stderr)) biasc <- 2 * mle - meanb print("bias-corrected point estimate") biasc }

Function defining computation of θˆ

> meanratio function(mdata) { mean(mdata$x)/mean(mdata$u) }

Function carrying out desired type of bootstrap > nonparboot.ratio function(mydat, B) {

nonparametric bootstrap for ratio of means

data object must contain 2 columns of data

returns B bootstrap estimates of ratio of means

if(ncol(mydat) != 2) { print("input matrix must have 2 columns of numeric data") } else { bootratio <- numeric() n <- nrow(mydat) # number of observations for(i in 1:B) { index1 <- sample(n, replace = T)

sample of size n from integers 1:n with replacement

boot1 <- mydat[index1, ]

bootstrap sample; rows of data corresponding to index

bootratio <- c(bootratio, mean(boot1$x) / mean(boot1$u)) }

19 bootratio } }

20 Function call and results

library(boot) data(city) drive.bootstrap(city, meanratio, nonparboot.ratio, 81) > drive.bootstrap(city, meanratio, nonparboot.ratio, 81) [1] "mle" "1.5203125" [1] "quantiles" 0.5% 2.5% 50% 97.5% 99.5% 1.229048 1.276364 1.523894 2.277978 2. [1] "mean" "1.58087829542998" [1] "stderr" "0.268075889571327" [1] "bias-corrected point estimate" [1] 1.

> drive.bootstrap(city, meanratio, nonparboot.ratio, 1000) [1] "mle" "1.5203125" [1] "quantiles" 0.5% 2.5% 50% 97.5% 99.5% 1.195834 1.254945 1.529401 2.118204 2. [1] "mean" "1.57122730135862" [1] "stderr" "0.239441130366229" [1] "bias-corrected point estimate" [1] 1.

R code for parametric bootstrap for the air conditioning data

> parboot.logexpmean function(mdata, B) {

parametric bootstrap for log of exponential mean parm

data object must contain 1 column of data

returns B bootstrap estimates of log exponential mean parm

if(ncol(mdata) != 1) { print("input data must have 1 column of numeric data") } else { mle <- logexpmean( mdata ) bootlogexpmean <- numeric() n <- nrow(mdata) # number of observations for(i in 1:B) { boot1 <- rexp( n, exp( - mle ) )

bootstrap sample; n random draws from exponential distribution

with mle from mdata as parameter

bootlogexpmean <- c(bootlogexpmean, logexpmean( as.matx(boot1) ) )

} bootlogexpmean } }

> logexpmean function(mdata) { if(ncol(mdata) != 1) { print("input data must have 1 column of numeric data") } else { log( mean(mdata) ) } }

27 > drive.bootstrap function(mdata, mlefunc, bootfunc, B) { mle <- mlefunc(mdata) # compute mle print(c("mle",mle)) boots <- bootfunc(mdata, B) # generate bootstrap samples print("quantiles") print(quantile(boots, c(0.005, 0.025, 0.5, 0.975, 0.995))) meanb <- mean(boots) print(c("mean", meanb)) stderr <- sqrt( var(boots) ) print(c("stderr", stderr)) biasc <- 2 * mle - meanb print("bias-corrected point estimate") print(biasc)

now bias-corrected percentile method of C.I.

first get Pr(a bootstrap estimate is <= rhohat)

cof <- length(boots[boots <= mle])/B print(c("cof", cof)) z0 <- qnorm(cof) # corresponding normal quantile z.alpha <- qnorm(0.975) p.low.end <- pnorm(2 * z0 - z.alpha) p.high.end <- pnorm(2 * z0 + z.alpha) print(c(p.low.end, p.high.end)) boots <- sort(boots) print("bias corrected C.I.") print(c(boots[B * p.low.end], boots[B * p.high.end])) }

28 > drive.bootstrap2(aircondit, logexpmean, parboot.logexpmean, 100) [1] "mle" "4.68290253452844" [1] "quantiles" 0.5% 2.5% 50% 97.5% 99.5% 3.834368 4.094229 4.710413 5.181885 5. [1] "mean" "4.67889044080901" [1] "stderr" "0.286036072515565" [1] "bias-corrected point estimate" [1] 4. [1] "cof" "0.45" [1] 0.01350800 0. [1] "bias corrected C.I." [1] 3.684830 5. > drive.bootstrap2(aircondit, logexpmean, parboot.logexpmean, 100) [1] "mle" "4.68290253452844" [1] "quantiles" 0.5% 2.5% 50% 97.5% 99.5% 4.046429 4.096043 4.684174 5.134347 5. [1] "mean" "4.66888582075081" [1] "stderr" "0.274983035251789" [1] "bias-corrected point estimate" [1] 4. [1] "cof" "0.5" [1] 0.025 0. [1] "bias corrected C.I." [1] 4.081475 5.

> drive.bootstrap2(aircondit, logexpmean, parboot.logexpmean, 1000) [1] "mle" "4.68290253452844" [1] "quantiles" 0.5% 2.5% 50% 97.5% 99.5% 3.760392 3.997390 4.640964 5.166151 5.

[1] "mean" "4.62143415974039" [1] "stderr" "0.29366260048462" [1] "bias-corrected point estimate" [1] 4. [1] "cof" "0.563" [1] 0.05021169 0. [1] "bias corrected C.I." [1] 4.113786 5.

> drive.bootstrap2(aircondit, logexpmean, parboot.logexpmean, 25000) [1] "mle" "4.68290253452844" [1] "quantiles" 0.5% 2.5% 50% 97.5% 99.5% 3.814595 4.033918 4.654331 5.180473 5. [1] "mean" "4.64203157705189" [1] "stderr" "0.292952308908569" [1] "bias-corrected point estimate" [1] 4. [1] "cof" "0.53676" [1] 0.03791469 0. [1] "bias corrected C.I." [1] 4.098616 5.

> drive.bootstrap2(aircondit, logexpmean, parboot.logexpmean, 25000) [1] "mle" "4.68290253452844" [1] "quantiles" 0.5% 2.5% 50% 97.5% 99.5% 3.795942 4.014882 4.654935 5.179212 5. [1] "mean" "4.64014458348023" [1] "stderr" "0.295191848042946" [1] "bias-corrected point estimate" [1] 4. [1] "cof" "0.53592"

[1] 0.03756714 0. [1] "bias corrected C.I." [1] 4.082338 5.

31 Bias-correcting bootstrap confidence intervals

recall: CDF̂ (q) = P r ∗(θˆ∗^ ≤^ q)

=

#{θˆb^ ≤ q} B

if CDF̂ (θˆ) 6 = .5, then bias correction to per- centile method c.i. may be in order
let z 0 = Φ−^1 ( CDF̂ (θˆ))
what Splus/R function evalutes Φ−^1
then bias-corrected 1  − α c.i. is ̂CDF −^1 (Φ(2z 0 − zα/ 2 )), CDF̂ −^1 (Φ(2z 0 + zα/ 2 ))

 

here zα/ 2 is upper α/2 point of standard normal Φ(zα/ 2 ) = 1 − α/ 2