Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Nonparametric Curve Estimation: Pointwise vs. Simultaneous Confidence & Bias Correction - , Study notes of Data Analysis & Statistical Methods

University of California - Davis Data Analysis & Statistical Methods

Prof. Peter G. Hall

Nonparametric curve estimation methods, focusing on pointwise and simultaneous confidence regions. It covers the impact of bandwidth on bias and variance, and the challenges of bias correction in nonparametric problems using the bootstrap. The document also includes an exercise on mean and variance of the estimator.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-i62-2 🇺🇸

10 documents

1 / 38

This page cannot be seen from the preview

Don't miss anything!

METHODOLOGY AND THEORY

FOR THE BOOTSTRAP

(Seventh set of two lectures)

Main topic of these lectures: Bootstrap

methods for nonparametric curve estima-

tion

Pointwise versus simultaneous confidence

regions

We shall dispose of this topic first, so that we

can focus subsequently on other issues.

Suppose we have an estimator ˆgof a func-

tion gon an interval I, and, for a given level

1−αof probability, have constructed a con-

fidence region, or “tube,” for g, consisting of

a boundary above and a boundary below the

curve represented by the formula y= ˆg(x), for

x∈ I.

Discover Study notes of Data Analysis & Statistical Methods University of California - Davis

Partial preview of the text

Download Nonparametric Curve Estimation: Pointwise vs. Simultaneous Confidence & Bias Correction - and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

METHODOLOGY AND THEORY

FOR THE BOOTSTRAP

(Seventh set of two lectures)

Main topic of these lectures: Bootstrap methods for nonparametric curve estima- tion

Pointwise versus simultaneous confidence regions

We shall dispose of this topic first, so that we can focus subsequently on other issues.

Suppose we have an estimator ˆg of a func- tion g on an interval I, and, for a given level 1 − α of probability, have constructed a con- fidence region, or “tube,” for g, consisting of a boundary above and a boundary below the curve represented by the formula y = ˆg(x), for x ∈ I.

Pointwise versus simultaneous confidence regions (cont. 1)

The region can be interpreted as the union of intervals (ˆg 1 (x), ˆg 2 (x)), for x ∈ I. Of course, ˆg 1 and ˆg 2 are constructed from data, and satisfy ˆg 1 ≤ ˆg 2.

Such a region is commonly referred to a “(1 − α)-level confidence region for g on the interval I.”

We can interpret the statement in two ways. Either (i) the interval (ˆg 1 (x), ˆg 2 (x)) covers g(x) with probability approximately 1 − α, for each x ∈ I; or (ii) the probability that the graph rep- resented by the equation y = g(x) lies within the tube, converges to 1 − α as n increases.

Pointwise versus simultaneous confidence regions (cont. 3)

This close relationship vanishes in nonparamet- ric cases, however. There, simultaneous confi- dence regions are generally an order of magni- tude wider than their parametric counterparts.

Although the factor by which the width in- creases is proportional only to (log n)^1 /^2 , in asymptotic terms, the increase is generally substantial, and this alone causes simultaneous bands to be unpopular.

When coupled with the relative lack of interest in predicting the value of E(Y | X = x 0 ) simul- taneously for many values of x 0 , this means that the pointwise interpretation is the obvious choice in at least the setting of nonparamet- ric regression. We shall adopt it in the density estimation context, too.

Our treatment of confidence regions in the setting of nonparametric curve estimation will address only the case of nonparametric den- sity estimation. Nonparametric regression is broadly similar.

Local estimation

Nonparametric curve estimators (usually, esti- mators of densities or regression means) work without imposing structural assumptions.

Typically the only conditions required are that the function in question have sufficiently many bounded derivatives. That is, it should be suf- ficiently smooth.

Since only smoothness is assumed, the value of the estimator at a particular point, x say, is based largely, if not wholly, on data values close to x.

In density estimation, where we have a sample X 1 ,... , Xn from the distribution with density f , this means that to estimate f at x we use only those Xi’s that are close to x.

In regression, where estimators are based on data (x 1 , Y 1 ),... , (xn, Yn) and we wish to esti- mate g(x) = E(Y | X = x), we use only those pairs (xi, Yi) for which xi is close to x.

Example: Nonparametric density estima- tion

Suppose we sample independent and identi- cally distributed data X = {X 1 ,... , Xn} from a distribution with density f. We wish to esti- mate this function.

Let K be a bounded, compactly supported, symmetric probability density, let h > 0 denote a “bandwidth,” and put

fˆ (x) = 1 nh

∑^ n i=

K

(x − X i h

) .

This is our estimator of f (x).

The estimator fˆ is itself a density; it is non- negative and it integrates to 1, since K has both those properties.

Reliance of fˆ on bandwidth

Note that

ψi(x) = h−^1 K

(x − X i h

)

is itself a density, for each fixed Xi: ψi ≥ 0, ∫ ψi = 1.

The density ψi gets narrower and taller as h de- creases. Our estimator ˆf is obtained by simply averaging the values of the ψi’s. Clearly, ad- justing the bandwidth affects the shape, and hence the properties, of ˆf.

Exercise. Prove from these results, and el- ementary calculus, that if h = h(n) → 0 as n → ∞, in such a manner that nh → ∞; and if f has two continuous derivatives in a neigh- bourhood of x; then

E{ fˆ (x)} = f (x) + 12 κ 2 h^2 f ′′(x) + o(h^2 ) , var{ fˆ (x)} = (nh)−^1 κ f (x) + o{(nh)−^1 } ,

where κ =

∫ K^2 , κ 2 =

∫ u^2 K(u) du.

The first result here implies that ˆf (x) is asymp- totically unbiased for f (x). That is, as n in- creases, the difference between E{ fˆ (x)}, and the quantity f (x) that ˆf (x) is estimating, con- verges to zero.

The second result implies that the variance of fˆ (x) converges to zero as n increases.

Mean squared error

Therefore, the mean squared error of fˆ (x) is given by

E{ fˆ (x) − f (x)}^2 = var{ fˆ (x)} + {E fˆ (x) − f (x)}^2 =

C 1

C 2 h^4 + o{(nh)−^1 + h^4 } ,

where the constants C 1 = κ f (x) and C 2 = 1 4 κ

2 2 f^ ′′(x) (^2) depend on x.

(Proof: Use the results of the Exercise.)

It follows that the optimal choice of h, for the purpose of minimising mean squared error, is of size n−^1 /^5.

This order of magnitude of bandwidth brings the variance and squared-bias terms, of respec- tive orders (nh)−^1 and h^4 , into balance.

Choice of kernel

Common choices of K are the standard normal density,

K(u) =

2 π

exp

( −^12 u^2

) ,

and the k-weight kernel,

K(u) = ck

( 1 − u^2

for |u| ≤ 1, where the integer k ≥ 1 is chosen so that K ntegrates to 1. The case k = 2 is popular; then K is called the “biweight kernel.”

High-order kernels

More generally, we can take K to satisfy

∫ (^) ∞ −∞

uj^ K(u) du

  

= 1 if j = 0, = 0 if 1 ≤ j ≤ r − 1, 6 = 0 if j = r,

for a given integer r ≥ 1.

When using a kernel of this type the variance of fˆ remains of size (nh)−^1 , but the order of bias changes from h^2 to hr. Therefore, by choos- ing r > 2 we can improve, at least in theory, the mean-square performance of fˆ. In partic- ular, the order of mean squared error can be reduced to n−^2 r/(2r+1)^ by choosing h of size n−^1 /(2r+1).

Note, however, that if r > 2 then if (1) is to hold, K must take negative values, and as a result, fˆ is not any longer guaranteed to be nonnegative.

We shall take r = 2 in all the arguments below.

Nonparametric and semiparametric prob- lems (cont.)

The effective number of parameters that are being fitted, when constructing the density es- timator fˆ in a given interval, equals the num- ber of bandwidths that can be fitted into the interval.

Therefore the number of fitted parameters di- verges at rate n^1 /^5 as sample size grows.

In comparison, estimation of “global” char- acteristics, such as mean, variance and other moments, is a semiparametric problem. Al- though in such cases estimation involves a po- tentially infinite number of unknowns, conven- tional convergence rates can be attained.

Implications for the bootstrap

We have not, so far, encountered cases where the effective number of parameters grew un- boundedly as sample size increased.

The implications for the bootstrap are man- ifested in at least two ways: difficulties with bias, and a worsening of overall convergence rate (including the order of magnitude of cov- erage error).

Both these difficulties are manifested to some extent in more conventional, parametric prob- lems, where the number of parameters is large, although fixed (as sample size increases).

In such instances, difficulties with bias and ac- curacy arise frequently, although in a theoret- ical treatment they do not result in an actual deterioration of convergence rate.

Bias in semiparametric problems (cont.)

Moreover, the bootstrap estimator of bias,

bias =̂ E(ˆθ∗^ | X ) − θ ,ˆ

accurately approximates bias. Indeed,

bias = bias +̂ Op^ ( n−^3 /^2 ) ,

and this high degree of precision led us to sug- gest ˆθ − bias as a bias-corrected estimator of̂ θ:

θˆbc = ˆθ − bias = ˆ̂ θ − {E(ˆθ∗^ | X ) − θˆ} = 2 ˆθ − E(ˆθ∗^ | X ).

Bias in nonparametric problems

Reflecting the infinite-parameter nature of nonparametric density estimation, the boot- strap fails rather spectacularly to approximate bias.

To appreciate this point, let X ∗^ = {X 1 ∗,... , X n∗} denote a resample drawn by sampling random- ly, with replacement, from X. Then the stan- dard bootstrap form of ˆf is ˆf ∗, defined by

fˆ ∗(x) = 1 nh

∑^ n i=

K

( x − X i∗ h

) .

Now,

{ K

( x − X i∗ h

) ∣ ∣∣ ∣

∣∣ ∣∣ X

}

∑^ n i=

K

( x − X i∗ h

) = h fˆ (x).

Therefore,

E{ fˆ ∗(x) | X } = ˆf (x) ,

implying that the bootstrap estimator of bias is

bias =̂ E{ fˆ ∗(x) | X } − fˆ (x) = 0.

Nonparametric Curve Estimation: Pointwise vs. Simultaneous Confidence & Bias Correction - , Study notes of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Nonparametric Curve Estimation: Pointwise vs. Simultaneous Confidence & Bias Correction - and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

METHODOLOGY AND THEORY

FOR THE BOOTSTRAP

K

C 1

K

K