Econometrics Lecture: Variance Estimation & Bootstrap (ARE213, UC Berkeley, Fall 04), Study notes of Introduction to Econometrics

These lecture notes from the university of california, berkeley cover the topic of variance estimation and the bootstrap method in econometrics. The concept of ordinary least squares ii, assumption 1 and its distributional result, and introduces assumption 2 and the heteroskedasticity-consistent variance. The lecture also discusses bootstrapping as an alternative method for estimating the variance of least squares estimators.

Typology: Study notes

Pre 2010

Uploaded on 10/01/2009

koofers-user-tid-1
koofers-user-tid-1 🇺🇸

10 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Imbens, Lecture Notes 2, ARE213 Fall ’04 1
ARE213 Econometrics
Fall 2004 UC Berkeley Department of Agricultural and Resource Economics
Ordinary Least Squares II:
Variance Estimation and the Bootstrap (W 4.2.3, 12.8.2)
In the first lecture we considered the standard linear model
Yi=β0Xi+εi.(1)
We looked at estimating βand functions of βunder the following assumption:
Assumption 1 εiXi∼N(02).
Assuming also that the observations are drawn randomly from some population the following
distributional result was stated for the least squares estimator:
N(ˆ
ββ)d
−→ N 02·(E[XX0])1.
In fact for this result it is sufficient that εiis independent of Xi, one does not need normality
of the εi. We estimated the asymptotic variance as
ˆσ2
ml · 1
N
N
X
i=1
XiX0
i!1
,
where
ˆσ2
ml =1
N
N
X
i=1 Yiˆ
β0Xi2.
In this lecture I want to explore alternative ways of estimating the variance, and relate
them to alternative assumptions about the distribution and properties of the residuals.
First we consider the distribution of ˆ
βunder much weaker assumptions. Instead of
independence and normality of the ε, we make the following assumption:
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Econometrics Lecture: Variance Estimation & Bootstrap (ARE213, UC Berkeley, Fall 04) and more Study notes Introduction to Econometrics in PDF only on Docsity!

ARE213 Econometrics Fall 2004 UC Berkeley Department of Agricultural and Resource Economics

Ordinary Least Squares II: Variance Estimation and the Bootstrap (W 4.2.3, 12.8.2)

In the first lecture we considered the standard linear model

Yi = β′Xi + εi. (1)

We looked at estimating β and functions of β under the following assumption:

Assumption 1 εi

∣ Xi ∼ N (0, σ^2 ).

Assuming also that the observations are drawn randomly from some population the following distributional result was stated for the least squares estimator:

√ N ( βˆ − β) −→ Nd

0 , σ^2 · (E[XX′])−^1

In fact for this result it is sufficient that εi is independent of Xi, one does not need normality of the εi. We estimated the asymptotic variance as

σˆ^2 ml ·

N

∑^ N

i=

XiX i′

where

σˆ^2 ml = N^1

∑^ N

i=

Yi − βˆ′Xi

In this lecture I want to explore alternative ways of estimating the variance, and relate them to alternative assumptions about the distribution and properties of the residuals.

First we consider the distribution of βˆ under much weaker assumptions. Instead of independence and normality of the ε, we make the following assumption:

Assumption 2 E[εi · Xi] = 0.

This essentially defines the true value of β to be the best linear predictor:

β = (E[XX′])−^1 · E[XY ].

Under this assumption and independent sampling, we still have normality for the least squares estimator, but now with a different variance:

√ N ( βˆ − β) −→ Nd

0 , (E[XX′])−^1 (E[ε^2 XX′])^ (E[XX′])−^1

Let the asymptotic variance be denoted by

V = (E[XX′])−^1 (E[ε^2 XX′])^ (E[XX′])−^1.

This is known as the heteroskedasticity-consistent variance, or the robust variance. To see where this variance comes from, write the least squares estimator minus the truth as

βˆ − β =

N

∑^ N

i=

XiX i′

N

∑^ N

i=

XiYi

− β

N

∑^ N

i=

XiX i′

N

∑^ N

i=

XiX i′ β

N

∑^ N

i=

XiX i′

N

∑^ N

i=

Xiεi

− β

N

∑^ N

i=

XiX i′

N

∑^ N

i=

Xiεi

The variance of the second factor is

E

N

∑^ N

i=

Xiεi

 = N^12

∑^ N

i=

E [ε^2 i XiX i′^ ]^ = N^1 · E[ε^2 XX′].

If we use the empirical distribution function instead of the actual distribution function in expressions (2) and (3), the expected value is

μ˜X =

xd FˆX (x) = N^1

∑^ N

i=

xi = ¯x.

The variance is

σ˜^2 X =

(x − μ˜X )^2 d FˆX (x) =

(x − x¯)^2 d FˆX (x) =

∑^ N

i=

(xi − ¯x)^2 /N = S X^2 (N − 1)/N.

Hence we would end up estimating the variance of the sample average as

Vˆ(¯x) = S^2 X (N − 1)/N^2 ,

which is pretty close to the standard estimate of S X^2 /N.

Now this calculation is more complicated than it need be. In practice we do not need the exact bootstrap variance Vˆ (¯x), which is not equal to the exact variance of ¯x anyway. But if all we are interested in is an approximation, we can make the calculation a lot simpler. If we want the distribution of some statistic W (X) such as the sample average, according to the empirical distribution function, we can draw from the empirical distribution. So, consider the discrete distribution with support {x 1 , x 2 ,... , xN }, and probabilities P r(X = xi) = 1/N for all i. Draw a random sample from this distribution, of size N, and calculate the statistic W (X). Repeat this many times and calculate the average and sample variance of W (X) over these random samples. That will give us, by the law of large numbers, the population mean and variance of W (X) according to the empirical distribution function.

Let us make this a little more specific. Suppose our sample is x 1 = 0, x 2 = 3 and x 3 = 1. The sample average is 4/3. We are interested in the variance of this sample average. The empirical distribution function is a discrete distribution with probability mass function

fˆ(x) = 1/ 3 ,

for x = 0, 1 , 3 and zero elsewhere. One random sample from this distribution could be

(0, 1 , 0).

The value of the statistic for this sample is w 1 = 1/3. The next sample could be

(0, 3 , 3),

with a sample average of w 2 = 2. After doing this many times, say M times, we can use the statistics w 1 , w 2 ,... , wM to approximate the expected value and variance of the empirical distribution functions as

Ê [W ] ≈ 1 M

∑^ M

i=

wi,

and

V̂ [W ] ≈ 1 M

∑^ M

i=

(wi − w¯)^2.

We then use this variance estimator Vˆ (W ) as an estimate of the variance of X¯.

We can do this is much more complex settings. Suppose we are interested in some regression parameters β, defined as

β = E[XX′]−^1 E[XY ].

Given a sample of size N, {(xi, yi}Ni=1, We can resample the pairs (yi, xi), to get a new sample {(xlj , ylj )}Nj=1, where for each l the random variable lj is an integer between 1 and N with P r(lj = k) = 1/N, and lj is independent of lk for j 6 = k. Then calculate for each data set the regression estimate

βˆl =^ (^

∑N

j=

xlj x′ lj

)− 1 ( ∑N

j=

xlj ylj

For l = 1,... , M, resample N residuals, by drawing N numbers from the set of integers from 1 to N, lj ∈ { 1 , 2 ,... , N}, for j = 1,... , N. Then construct the lth bootstrap sample (Ylj , Xj ) using

Y˜lj = X j′ βˆ + εlj.

Then proceed as before. If the disturbances are really independent of the X’s, this works better than the nonparametric bootstrap (and in fact can give exact results), but if not, the nonparametric bootstrap is to be preferred.

The second issue is an alternative to the bootstrap, the jackknife. Consider the original example of estimating the population mean. The sample average is ¯x, and we are interested in its variance. The jackknife estimate of the variance calculates for each i the estimate based on leaving out the ith observation:

x¯(i) = (^) N 1 − 1

j 6 =i

xj.

Given these N estimates of the mean, which clearly average out to ¯x, the variance of ¯x is estimated as

Vˆ (¯x) =

∑^ N

i=

(¯x(i) − x¯)^2.

To see why this works, consider the difference

x¯(i) − x¯ =

j 6 =i

N(N − 1)xj^ −^

N xi.

The expectation of this difference is obviously zero. The variance is

V (¯x(i) − x¯) = E

[∑

j 6 =i

N(N − 1)(xj^ −^ μ)^ −^

N (xi^ −^ μ)

] 2

j 6 =i

N^2 (N − 1)^2 σ

N^2 σ

(^2) = σ (^2) ·^ (^1 N(N − 1)^2 +^

N^2

≈ σ^2 /N.

Averaging this over all observations gives approximately σ^2 /N which is the variance for ¯x.

The final concept is that of improved variance estimates. Instead of calculating the variance this way, we could bootstrap other statistics such as t–statistics. Suppose we wish to get a confidence interval for E[X]. A simple way to do this is to calculate the sample average ¯x, the sample variance S^2 , and estimate the 95% confidence interval as

[¯x − 1. 96 × S/√N, x¯ + 1. 96 × S/√N ].

A bootstrapping version works as follows. Draw for l = 1,... , M a boostrap sample of size N from the empirical distribution function, xlj , j = 1,... , N, l = 1,... , M. For each bootstrap sample calculate the sample mean, variance and t-statistic:

x¯l = N^1

∑^ N

j=

xlj ,

S l^2 = (^) N 1 − 1

∑^ N

j=

(x lj −^ x¯l

and

tl = (¯xl − x¯)/(Sl/

N),

Calculate the 0.025 and 0.975 sample quantiles from the M t-statistics and denote them by bt 0. 025 and bt 0. 975. The 95% confidence interval is the set of all values of x such that

bt 0. 025 < (¯x − x)/(S/

N) < bt 0. 975.

This can lead to confidence intervals with better coverage properties. See Hall (1992) for details.

with now a slightly bigger difference (about 5% difference in standard errors).

Let us go back to the model in logs and consider bootstrap standard errors. We con- sider two versions. First, we bootstrap the residuals, keeping the covariates the same (the parametric bootstrap). Second, we do the nonparametric bootstrap. The results for the first are in round brackets, for the second in square brackets, both based on 100,000 bootstrap replications:

log(earnings)̂ i = 5 .0455 + 0. 0667 · educi (0.0850) (0.0062) [0.0861] [0.0064].

Again the standard errors are very similar to those based on the conventional calculations.

Let us know see how well the various confidence intervals work in practice. I carried out the following experiment. I took the census data used by Angrist and Krueger in their returns to schooling paper (QJE, 1991). This has observations for 329,509 individuals on (among other things) wages and education. I ran a linear regression of log wages on a constant and years of education, with the following result:

log(earnings)̂ i = 4 .9952 + 0. 0709 · educi (0.00045) (0.0003).

Next, I take this sample of 329,509 individuals as the population. Repeatedly I draw 5, random samples (with replacement, although this does not matter at all given the size of the population), of size n (for n = 20, n = 100, and n = 500). In each case I estimate same linear regression and calculate the standard errors in four different ways: (i) conventional ols standard errors, (ii) robust ols standard errors, (iii) parametric bootstrap, and (iv) nonparametric bootstrap. Given a confidence interval (for both 90% and 95% confidence interval) for the coefficient on years of education I check whether the “true value” (0.709) is in there. I calculate how often that happens over the 5,000 replications. The results are as

follows. In the first row of each part of the table I report the converage probabilities, and in the second part of the table the t-statistic for the null hypothesis that the actual coverage rate is equal to the nominal one (0.95 or 0.90).

Table 1: Actual versus Nominal Coverage Rates

95% confidence interval 90% confidence interval convent robust par boot nonpar boot convent robust par boot nonpar boot

n = 20 0.9072 0.8819 0.8898 0.9353 0.8466 0.8173 0.8274 0. -17.2 -27.3 -24.1 -5.9 -15.6 -24.1 -21.2 -4. n = 100 0.9155 0.9378 0.9140 0.9437 0.8562 0.8808 0.8523 0. -12.3 -4.3 -12.8 -2.3 -11.3 -4.9 -12.3 -2. n = 500 0.9284 0.9502 0.9274 0.9510 0.8693 0.9051 0.8681 0. -6.9 0.1 -7.2 0.3 -7.1 1.2 -7.4 1.

With 500 observations the robust and nonparametric bootstrap based intervals are very accurate, in contrast to the conventional and parametric bootstrap based intervals. With smaller sample sizes all intervals deteriorate. The conventional intervals end up being supe- rior to the robust intervals for small sample sizes.