Multiple Regression Analysis: Estimation and Properties of OLS Estimators, Study notes of Economics

An in-depth analysis of multiple regression analysis, focusing on estimation and the properties of ordinary least squares (ols) estimators. The extension of the simple regression model to include multiple explanatory factors, the key assumptions for the model, the calculation of the ordinary least squares criterion, and the interpretation of fitted values and residuals. Additionally, it discusses the statistical properties of the ols estimators, including their unbiasedness and efficiency under certain assumptions.

Typology: Study notes

Pre 2010

Uploaded on 08/27/2009

koofers-user-9wj
koofers-user-9wj 🇺🇸

10 documents

1 / 25

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Wooldridge, Introductory Econometrics, 3d ed.
Chapter 3: Multiple regression analysis:
Estimation
In multiple regression analysis, we extend the
simple (two-variable) regression model to con-
sider the possibility that there are additional
explanatory factors that have a systematic ef-
fect on the dependent variable. The simplest
extension is the “three-variable” model, in which
a second explanatory variable is added:
y=β0+β1x1+β2x2+u(1)
where each of the slope coefficients are now
partial derivatives of ywith respect to the x
variable which they multiply: that is, hold-
ing x2fixed, β1=y/∂x1.This extension also
allows us to consider nonlinear relationships,
such as a polynomial in z, where x1=zand
x2=z2.Then, the regression is linear in x1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19

Partial preview of the text

Download Multiple Regression Analysis: Estimation and Properties of OLS Estimators and more Study notes Economics in PDF only on Docsity!

Wooldridge, Introductory Econometrics, 3d ed.

Chapter 3: Multiple regression analysis: Estimation

In multiple regression analysis, we extend the simple (two-variable) regression model to con- sider the possibility that there are additional explanatory factors that have a systematic ef- fect on the dependent variable. The simplest extension is the “three-variable” model, in which a second explanatory variable is added:

y = β 0 + β 1 x 1 + β 2 x 2 + u (1)

where each of the slope coefficients are now partial derivatives of y with respect to the x variable which they multiply: that is, hold- ing x 2 fixed, β 1 = ∂y/∂x 1. This extension also allows us to consider nonlinear relationships, such as a polynomial in z, where x 1 = z and x 2 = z^2. Then, the regression is linear in x 1

and x 2 , but nonlinear in z : ∂y/∂z = β 1 + 2β 2 z. The key assumption for this model, analogous to that which we specified for the simple re- gression model, involves the independence of the error process u and both regressors, or ex- planatory variables:

E (^) (u | x 1 , x 2 ) = 0. (2)

This assumption of a zero conditional mean for the error process implies that it does not systematically vary with the x′s nor with any linear combination of the x′s; u is independent, in the statistical sense, from the distributions of the x′s.

The model may now be generalized to the case of k regressors:

y = β 0 + β 1 x 1 + β 2 x 2 + ... + βkxk + u (3)

where the β coefficients have the same inter- pretation: each is the partial derivative of y

and we may define the ordinary least squares criterion in terms of the OLS residuals, calcu- lated from a sample of size n, from this expres- sion:

min S =

∑^ n i=

(yi − b 0 − b 1 xi 1 − b 2 xi 2 )^2 (6)

where the minimization of this expression is performed with respect to each of the three parameters, {b 0 , b 1 , b 2 }. In the case of k regres- sors, these expressions include terms in bk, and the minimization is performed with respect to the (k + 1) parameters {b 0 , b 1 , b 2 , ...bk}. For this to be feasible, n > (k + 1) : that is, we must have a sample larger than the number of pa- rameters to be estimated from that sample. The minimization is carried out by differenti- ating the scalar S with respect to each of the b′s in turn, and setting the resulting first order condition to zero. This gives rise to (k + 1) si- multaneous equations in (k +1) unknowns, the regression parameters, which are known as the

least squares normal equations. The nor- mal equations are expressions in the sums of squares and cross products of the y and the re- gressors, including a first “regressor” which is a column of 1′s, multiplying the constant term. For the “three-variable” regression model, we can write out the normal equations as: ∑ y = nb 0 + b 1

∑ x 1 + b 2

∑ x 2 (7) ∑ x 1 y = b 0

∑ x 1 + b 1

∑ x^21 + b 2

∑ x 1 x 2 ∑ x 2 y = b 0

∑ x 2 + b 1

∑ x 1 x 2 + b 2

∑ x^22

Just as in the “two-variable” case, the first normal equation can be interpreted as stat- ing that the regression surface (in 3-space) passes through the multivariate point of means {¯x 1 , ¯x 2 , y¯}. These three equations may be uniquely solved, by normal algebraic techniques or linear algebra, for the estimated least squares param- eters.

This extends to the case of k regressors and (k+1) regression parameters. In each case, the

ei = yi − yˆi (9)

As with simple regression, the sum of the resid- uals is zero; they have, by construction, zero covariance with each of the x variables, and thus zero covariance with ˆy; and since the av- erage residual is zero, the regression surface passes through the multivariate point of means, {¯x 1 , ¯x 2 , ..., ¯xk, y¯}.

There are two instances where the simple re- gression of y on x 1 will yield the same coeffi- cient as the multiple regression of y on x 1 and x 2 , with respect to x 1. In general, the simple re- gression coefficient will not equal the multiple regression coefficient, since the simple regres- sion ignores the effect of x 2 (and considers that it can be viewed as nonsystematic, captured in the error u). When will the two coefficients be

equal? First, when the coefficient of x 2 is truly zero–that is, when x 2 really does not belong in the model. Second, when x 1 and x 2 are un- correlated in the sample. This is likely to be quite rare in actual data. However, these two cases suggest when the two coefficients will be similar; when x 2 is relatively unimportant in explaining y, or when it is very loosely related to x 1.

We can define the same three sums of squares– SST, SSE, SSR−as in simple regression, and R^2 is still the ratio of the explained sum of squares (SSE) to the total sum of squares (SST ). It is no longer a simple correlation (e.g. ryx) squared, but it still has the interpretation of a squared simple correlation coefficient: the correlation between y and ˆy, ryyˆ. A very im- portant principle is that R^2 never decreases when an explanatory variable is added to a

Expected value of the OLS estimators

We now discuss the statistical properties of the OLS estimators of the parameters in the pop- ulation regression function. The population model is taken to be (3). We assume that we have a random sample of size n on the vari- ables of the model. The multivariate analogue to our assumption about the error process is now:

E (^) (u | x 1 , x 2 , ..., xk) = 0 (10)

so that we consider the error process to be independent of each of the explanatory vari- ables’ distributions. This assumption would not hold if we misspecified the model: for in- stance, if we ran a simple regression with inc as the explanatory variable, but the population model also contained inc^2. Since inc and inc^2 will have a positive correlation, the simple re- gression’s parameter estimates will be biased.

This bias will also appear if there is a sepa- rate, important factor that should be included in the model; if that factor is correlated with the included regressors, their coefficients will be biased.

In the context of multiple regression, with sev- eral independent variables, we must make an additional assumption about their measured val- ues:

Proposition 1 In the sample, none of the in- dependent variables x may be expressed as an exact linear relation of the others (including a vector of 1 s).

Every multiple regression that includes a con- stant term can be considered as having a vari- able x 0 i = 1 ∀i. This proposition states that each of the other explanatory variables must have nonzero sample variance: that is, it may

to the athletics program, we find that there is perfect collinearity: since for every college in the sample, the three variables sum to one by construction. There is no information in, e.g., x 3 once we know the other two, so in- cluding it in a regression with the other two makes no sense (and renders that regression uncomputable). We can leave any one of the three variables out of the regression; it does not matter which one. Note that this proposi- tion is not an assumption about the population model: it is an implication of the sample data we have to work with. Note also that this only applies to linear relations among the explana- tory variables: a variable and its square, for instance, are not linearly related, so we may include both in a regression to capture a non- linear relation between y and x.

Given the four assumptions: that of the pop- ulation model, the random sample, the zero

conditional mean of the u process, and the ab- sence of perfect collinearity, we can demon- strate that the OLS estimators of the popula- tion parameters are unbiased:

Ebj = βj, j = 0, ..., k (11)

What happens if we misspecify the model by including irrelevant explanatory variables: x variables that, unbeknowst to us, are not in the population model? Fortunately, this does not damage the estimates. The regression will still yield unbiased estimates of all of the coef- ficients, including unbiased estimates of these variables’ coefficients, which are zero in the population. It may be improved by removing such variables, since including them in the re- gression consumes degrees of freedom (and re- duces the precision of the estimates); but the

to be fully specified. What are the conse- quences of estimating the latter relationship? We can show that in this case:

Eb 1 = β 1 + β 2

∑n ∑i=1^ (xi^1 −^ ¯x^1 )^ xi^2 n i=1 (xi^1 −^ x¯^1 )

so that the OLS coefficient b 1 will be biased– not equal to its population value of β 1 , even in an expected sense–in the presence of the second term. That term will be nonzero when β 2 is nonzero (which it is, by assumption) and when the fraction is nonzero. But the frac- tion is merely a simple regression coefficient in the auxiliary regression of x 2 on x 1. If the re- gressors are correlated with one another, that regression coefficient will be nonzero, and its magnitude will be related to the strength of the correlation (and the units of the variables). Say that the auxiliary regression is:

x 1 = d 0 + d 1 x 2 + u (15)

with d 1 > 0 , so that x 1 and x 2 are positively correlated (e.g. as income and wealth would

be in a sample of household data). Then we can write the bias as:

Eb 1 − β 1 = β 2 d 1 (16)

and its sign and magnitude will depend on both the relation between y and x 2 and the inter- relation among the explanatory variables. If there is no such relationship–if x 1 and x 2 are uncorrelated in the sample–then b 1 is unbiased (since in that special case multiple regression reverts to simple regression). In all other cases, though, there will be bias in the estimation of the underspecified model. If the left side of (16) is positive, we say that b 1 has an upward bias: the OLS value will be too large. If it were negative, we would speak of a downward bias. If the OLS coefficient is closer to zero than the population coefficient, we would say that it is “biased toward zero” or attenuated.

It is more difficult to evaluate the potential bias in a multiple regression, where the popu- lation relationship involves k variables and we

If this assumption is satisfied, then the error variance is identical for all combinations of the explanatory variables. If it is violated, we say that the errors are heteroskedastic, and must be concerned about our computation of the OLS estimates’ variances. The OLS estimates are still unbiased in this case, but our esti- mates of their variances are not. Given this assumption, plus the four made earlier, we can derive the sampling variances, or precision, of the OLS slope estimators:

V ar

( bj

)

σ^2 SSTj

( 1 − R^2 j

), j = 1, ..., k (18)

where SSTj is the total variation in xj about its mean, and R^2 j is the R^2 from an auxiliary regression from regressing xj on all other x variables, including the constant term. We see immediately that this formula applies to sim- ple regression, since the formula we derived for

the slope estimator in that instance is identi- cal, given that R j^2 = 0 in that instance (there are no other x variables). Given the population error variance σ^2 , what will make a particular OLS slope estimate more precise? Its preci- sion will be increased (i.e. its sampling vari- ance will be smaller) the larger is the variation in the associated x variable. Its precision will be decreased, the larger the amount of vari- able xj that can be “explained” by other vari- ables in the regression. In the case of perfect collinearity, R j^2 = 1, and the sampling variance goes to infinity. If R^2 j is very small, then this variable makes a large marginal contribution to the equation, and we may calculate a relatively more precise estimate of its coefficient. If R^2 j is quite large, the precision of the coefficient will be low, since it will be difficult to “partial out” the effect of variable j on y from the effects of the other explanatory variables (with which it is highly correlated). However, we must has- ten to add that the assumption that there is no