Multiple Regression Analysis: Inference and Hypothesis Testing, Exams of Economics

This chapter discusses the inference and hypothesis testing aspects of multiple regression analysis. The gauss-markov theorem is introduced, which establishes the ols estimators as having the smallest variance among linear unbiased estimators. The document then focuses on the sampling distribution of ols estimators and how it can be used to test hypotheses about population parameters. Testing hypotheses on a single population parameter, confidence intervals, and testing multiple linear restrictions.

Typology: Exams

Pre 2010

Uploaded on 08/26/2009

koofers-user-zm4
koofers-user-zm4 🇺🇸

8 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Wooldridge, Introductory Econometrics, 3d ed.
Chapter 4: Multiple regression analysis:
Inference
We have discussed the conditions under which
OLS estimators are unbiased, and derived the
variances of these estimators under the Gauss-
Markov assumptions. The Gauss-Markov the-
orem establishes that OLS estimators have the
smallest variance of any linear unbiased estima-
tors of the population parameters. We must
now more fully characterize the sampling distri-
bution of the OLS estimators–beyond its mean
and variance–so that we may test hypotheses
on the population parameters. To make the
sampling distribution tractable, we add an as-
sumption on the distribution of the errors:
Proposition 1 MLR6 (Normality) The popu-
lation error uis independent of the explanatory
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Multiple Regression Analysis: Inference and Hypothesis Testing and more Exams Economics in PDF only on Docsity!

Wooldridge, Introductory Econometrics, 3d ed.

Chapter 4: Multiple regression analysis: Inference

We have discussed the conditions under which OLS estimators are unbiased, and derived the variances of these estimators under the Gauss- Markov assumptions. The Gauss-Markov the- orem establishes that OLS estimators have the smallest variance of any linear unbiased estima- tors of the population parameters. We must now more fully characterize the sampling distri- bution of the OLS estimators–beyond its mean and variance–so that we may test hypotheses on the population parameters. To make the sampling distribution tractable, we add an as- sumption on the distribution of the errors:

Proposition 1 MLR6 (Normality) The popu- lation error u is independent of the explanatory

variables x 1 , .., xk and is normally distributed with zero mean and constant variance: u ∼ N

( 0 , σ^2

) .

This is a much stronger assumption than we have previously made on the distribution of the errors. The assumption of normality, as we have stated it, subsumes both the assumption of the error process being independent of the explanatory variables, and that of homoskedas- ticity. For cross-sectional regression analysis, these six assumptions define the classical lin- ear model. The rationale for normally dis- tributed errors is often phrased in terms of the many factors influencing y being additive, ap- pealing to the Central Limit Theorem to sug- gest that the sum of a large number of random factors will be normally distributed. Although we might have reason in a particular context to doubt this rationale, we usually use it as a working hypothesis. Various transformations– such as taking the logarithm of the dependent

also show that any linear combination of the bj is also normally distributed, and a subset of these estimators has a joint normal distri- bution. These properties will come in handy in formulating tests on the coefficient vector. We may also show that the OLS estimators will be approximately normally distributed (at least in large samples), even if the underlying errors are not normally distributed.

Testing an hypothesis on a single βj

To test hypotheses about a single population parameter, we start with the model containing k regressors:

y = β 0 + β 1 x 1 + β 2 x 2 + ... + βkxk + u (3)

Under the classical linear model assumptions, a test statistic formed from the OLS estimates may be expressed as:

( bj − βj

)

sbj

∼ tn−k− 1 (4)

Why does this test statistic differ from (2) above? In that expression, we considered the variance of bj as an expression including σ, the unknown standard deviation of the error term (that is,

√ σ^2 ). In this operational test statistic (4), we have replaced σ with a consistent es- timate, s. That additional source of sampling variation requires the switch from the standard normal distribution to the t distribution, with (n − k − 1) degrees of freedom. Where n is not all that large relative to k, the resulting t distri- bution will have considerably fatter tails than the standard normal. Where (n − k − 1) is a large number–greater than 100, for instance– the t distribution will essentially be the stan- dard normal. The net effect is to make the critical values larger for a finite sample, and

of this parameter–bj− will never equal exactly zero. But when should we conclude that it could be zero? When its value cannot be dis- tinguished from zero. There will be cause to reject this null hypothesis if the value, scaled by its standard error, exceeds the threshold. For a “two-tailed test,” there will be reason to reject the null if the “t-stat” takes on a large negative value or a large positive value; thus we reject in favor of the alternative hypothesis (of βj 6 = 0) in either case. This is a two-sided alternative, giving rise to a two-tailed test. If the hypothesis is to be tested at, e.g., the 95% level of confidence, we use critical values from the t-distribution which isolate 2.5% in each tail, for a total of 5% of the mass of the dis- tribution. When using a computer program to calculate regression estimates, we usually are given the “p-value” of the estimate–that is, the tail probability corresponding to the coef- ficient’s t-value. The p-value may usefully be

considered as the probability of observing a t- statistic as extreme as that shown if the null hypothesis is true. If the t-value was equal to, e.g., the 95% critical value, the p-value would be exactly 0.05. If the t-value was higher, the p-value would be closer to zero, and vice versa. Thus, we are looking for small p-values as in- dicative of rejection. A p-value of 0.92, for in- stance, corresponds to an hypothesis that can be rejected at the 8% level of confidence–thus quite irrelevant, since we would expect to find a value that large 92% of the time under the null hypothesis. On the other hand, a p-value of 0.08 will reject at the 90% level, but not at the 95% level; only 8% of the time would we expect to find a t-statistic of that magnitude if H 0 was true.

What if we have a one-sided alternative? For instance, we may phrase the hypothesis of in- terest as:

H 0 : βj > 0 (6) HA : βj ≤ 0

hypotheses that a population parameter equals zero. What if we want to test a different hy- pothesis about the value of that parameter? For instance, we would not consider it sensible for the mpc for a consumer to be zero, but we might have an hypothesized value (of, say, 0.8) implied by a particular theory of consumption. How might we test this hypothesis? If the null is stated as:

H 0 : βj = aj (7)

where aj is the hypothesized value, then the appropriate test statistic becomes:

( bj − aj

)

sbj

∼ tn−k− 1 (8)

and we may simply calculate that quantity and compare it to the appropriate point on the t- distribution. Most computer programs provide you with assistance in this effort; for instance,

if we believed that aj, the coefficient on bdrms, should be equal to $20,000 in a regression of house prices on square footage and bdrms (e.g. using HPRICE1), we would use Stata’s test command:

regress price bdrms sqrft test bdrms=

where we use the name of the variable as a shorthand for the name of the coefficient on that variable. Stata, in that instance, presents us with:

( 1) bdrms = 20000.

F( 1, 85) = 0. Prob > F = 0.

making use of an F-statistic, rather than a t- statistic, to perform this test. In this partic- ular case–of an hypothesis involving a single

one tail. Any number of test commands may be given after a regress command in Stata, testing different hypotheses about the coeffi- cients.

Confidence intervals

As we discussed in going over Appendix C, we may use the point estimate and its estimated standard error to calculate an hypothesis test on the underlying population parameter, or we may form a confidence interval for that pa- rameter. Stata makes that easy in a regression context by providing the 95% confidence inter- val for every estimated coefficient. If you want to use some other level of significance, you may either use the level() option on regress (e.g. regress price bdrms sqrft, level(90)) or you may change the default level for this run with set level. All further regressions will report

confidence intervals with that level of confi- dence. To connect this concept to that of the hypothesis test, consider that in the above ex- ample the 95% confidence interval for βbdrms extended from -3657.581 to 34053.96; thus, an hypothesis test with the null that βbdrms takes on any value in this interval (including zero) will not lead to a rejection.

Testing hypotheses about a single linear combination of the parameters

Economic theory will often suggest that a par- ticular linear combination of parameters should take on a certain value: for instance, in a Cobb-Douglas production function, that the slope coefficients should sum to one in the case of constant returns to scale (CRT S):

Q = ALβ^1 Kβ^2 Eβ^3 (9)

log Q = log A + β 1 log L + β 2 log K + β 3 log E + υ

random variables {b 1 , b 2 , b 3 } are not indepen- dent of one another since the underlying re- gressors are not independent of one another. Each of the underlying regressors is assumed to be independent of the error term u, but not of the other regressors. We would expect, for instance, that firms with a larger capital stock also have a larger labor force, and use more energy in the production process. The variance (and standard error) that we need may be readily calculated by Stata, however, from the variance-covariance matrix of the es- timated parameters via the test command:

test cap+labor+energy=

will provide the appropriate test statistic, again as an F-statistic with a p-value. You may in- terpret this value directly. If you would like the point and interval estimate of the hypothesized

combination, you can compute that (after a re- gression) with the lincom (linear combination) command:

lincom cap + labor + energy

will show the sum of those values and a confi- dence interval for that sum.

We may also use this technique to test other hypotheses than adding-up conditions on the parameters. For instance, consider a two-factor Cobb-Douglas function in which you have only labor and capital, and you want to test the hy- pothesis that labor’s share is 2/3. This implies that the labor coefficient should be twice the capital coefficient, or:

H 0 : βL = 2βK, or (12) H 0 : βL βK

= 2, or H 0 : βL − 2 βK = 0

which may be used to make a judgment about the hypothesis of interest.

Testing multiple linear restrictions

When we use the test command, an F-statistic is reported–even when the test involves only one coefficient–because in general, hypothesis tests may involve more than one restriction on the population parameters. The hypotheses discussed above–even that of CRTS, involv- ing several coefficients–still only represent one restriction on the parameters. For instance, if CRTS is imposed, the elasticities of the factors of production must sum to one, but they may individually take on any value. But in most applications of multiple linear regression, we concern ourselves with joint tests of restric- tions on the parameters.

The simplest joint test is that which every re- gression reports: the so-called “ANOVA F”

test, which has the null hypothesis that each of the slopes is equal to zero. Note that in a multiple regression, specifying that each slope individually equals zero is not the same thing as specifying that their sum equals zero. This “ANOVA” (ANalysis Of VAriance) F-test is of interest since it essentially tests whether the entire regression has any explanatory power. The null hypothesis, in this case, is that the “model” is y = β 0 + u : that is, none of the explanatory variables assist in explaining the variation in y. We cannot test any hypothesis on the R^2 of a regression, but we will see that there is an intimate relationship between the R^2 and the ANOVA F:

R^2 =

SSE

SST

F =

SSE/k SSR/ (n − (k + 1))

∴ F =

( R^2 /k 1 − R^2

) / (^) (n − (^) (k + 1))