Joint Hypothesis Tests: F-Test for Multiple Coefficients, Lecture notes of Statistics

The use of F-tests for joint hypothesis tests when dealing with multiple coefficients in regression analysis. It covers the concept of joint hypotheses, the role of F-tests in model selection, and the correlation between estimators. The document also includes examples and formulas for calculating the F-test statistic.

Typology: Lecture notes

2021/2022

Uploaded on 08/05/2022

char_s67
char_s67 ๐Ÿ‡ฑ๐Ÿ‡บ

4.5

(116)

1.9K documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
7 โ€“ Joint Hypothesis Tests
Now that we have multiple โ€œXโ€ variables, and multiple ฮฒs, our
hypotheses might also involve more than one ฮฒ.
โ€ข We shouldnโ€™t use t-tests
โ€ข We should use the F-test
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Joint Hypothesis Tests: F-Test for Multiple Coefficients and more Lecture notes Statistics in PDF only on Docsity!

7 โ€“ Joint Hypothesis Tests

Now that we have multiple โ€œ X โ€ variables, and multiple ฮฒ s, our

hypotheses might also involve more than one ฮฒ.

  • We shouldnโ€™t use t - tests
  • We should use the F - test

Example: CPS data again

summary(lm(wage ~ education + gender + age + experience))

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) - 1.9574 6.8350 - 0.286 0.

education 1.3073 1.1201 1.167 0.

genderfemale - 2.3442 0.3889 - 6.028 3.12e- 09 ***

age - 0.3675 1.1195 - 0.328 0.

experience 0.4811 1.1205 0.429 0.


Signif. codes: 0 โ€˜โ€™ 0.001 โ€˜โ€™ 0.01 โ€˜โ€™ 0.05 โ€˜.โ€™ 0.1 โ€˜ โ€™ 1

Residual standard error: 4.458 on 529 degrees of freedom

Multiple R-squared: 0.2533, Adjusted R-squared: 0.

F-statistic: 44.86 on 4 and 529 DF, p-value: < 2.2e- 16

The results of the above regression make me want to drop age and

experience.

This corresponds to the hypothesis:

H

0

: ฮฒ

3

= 0 and ฮฒ

4

H

A

: either ฮฒ

3

โ‰  0 or ฮฒ

4

โ‰  0 or both

Why would we want to drop variables?

A bigger problem: t

3

and t

4

are likely not independent

In the model:

0

1

1

2

2

3

3

4

4

  • suppose that ๐‘‹

3

and ๐‘‹

4

are not independent (e.g. they are

correlated)

  • then the OLS estimators b

3

and b

4

will be correlated - the

formula for b

3

(etc.) involves all of the โ€œXโ€ variables

(remember OVB)

  • then t

3

and t

4

will be correlated!

Example

Suppose that ๐‘‹

3

and ๐‘‹

4

are positively correlated. Consider the

null:

H

0

: ฮฒ

3

= 0 and ฮฒ

4

  • if b

3

and b

4

are both positive (or negative), itโ€™s not that big of a

deal

  • if one is positive and the other negative, thatโ€™s a big deal

Letโ€™s try the F - test

Iโ€™m going to estimate two models:

  • One model under the alternative hypothesis โ€“ weโ€™ll call the

unrestricted model (the ฮฒ s are allowed to be anything)

  • One model under the null hypothesis โ€“ called the restricted

model. I get this model by taking the null hypothesis to heart.

That is, substitute in the values ฮฒ

3

= 0 and ฮฒ

4

= 0 into the full

model

Unrestricted model (under H A

unrestricted <- lm(wage ~ education + gender

  • age + experience)

Restricted model (under H 0

restricted <- lm(wage ~ education + gender)

A formula for the F-test statistic

  • The F - test takes into account the correlation between the

estimators that are involved in the test

  • Note that if the unrestricted model โ€œfitsโ€ significantly better

than the restricted model, we should reject the null.

  • The difference in โ€œfitโ€ between the model under the null and

the model under the alternative leads to a formulation of the F -

test statistic, for testing joint hypotheses.

The RSS is a measure of fit:

๐‘–

2

๐‘›

๐‘–= 1

where

e

๐‘–

๐‘–

๐‘–

The F-test statistic may be written as:

๐‘Ÿ๐‘’๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘๐‘ก๐‘’๐‘‘

๐‘ข๐‘›๐‘Ÿ๐‘’๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘๐‘ก๐‘’๐‘‘

๐‘ข๐‘›๐‘Ÿ๐‘’๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘๐‘ก๐‘’๐‘‘

๐‘ข๐‘›๐‘Ÿ๐‘’๐‘ ๐‘ก๐‘Ÿ๐‘–๐‘๐‘ก๐‘’๐‘‘

where ๐‘ž = # of restrictions, k = # of โ€œ X โ€s

F =

2 2

2

unrestricted restricted

unrestricted unrestricted

R R q

R n k

where:

2

restricted

R = the R

2

for the restricted regression

2

unrestricted

R = the R

2

for the unrestricted regression

q = the number of restrictions under the null

k unrestricted

= the number of regressors in the unrestricted regression.

The bigger the difference between the restricted and unrestricted

R

2

โ€™s โ€“ the greater the improvement in fit by adding the variables in

question โ€“ the larger is the F statistic.

Testing you on the exam

  • The F - test statistic can be obtained by comparing the R

2

in the

restricted model ( H

0

model) and the unrestricted model ( H

A

model).

  • The decision to reject or not depends on whether the F - stat

exceeds the (5%) critical value:

q 5% critical value

  • These values are only accurate if n is large (weโ€™ll always

assume this)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) - 1.9574 6.8350 - 0.286 0.

education 1.3073 1.12 01 1.167 0.

genderfemale - 2.3442 0.3889 - 6.028 3.12e- 09 ***

age - 0.3675 1.1195 - 0.328 0.

experience 0.4811 1.1205 0.429 0.


Signif. codes: 0 โ€˜โ€™ 0.001 โ€˜โ€™ 0.01 โ€˜โ€™ 0.05 โ€˜.โ€™ 0.1 โ€˜ โ€™ 1

Residual standard error: 4.458 on 529 degrees of freedom

Multiple R-squared: 0.2533, Adjusted R-squared: 0.

F-statistic: 44.86 on 4 and 529 DF, p-value: < 2.2e- 16

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.21783 1.03632 0.210 0.

education 0.75128 0.07682 9.779 < 2e- 16 ***

genderfemale - 2.12406 0.40283 - 5.273 1.96e- 07 ***


Signif. codes: 0 โ€˜โ€™ 0.001 โ€˜โ€™ 0.01 โ€˜โ€™ 0.05 โ€˜.โ€™ 0.1 โ€˜ โ€™ 1

Residual standard error: 4.639 on 531 degrees of freedom

Multiple R-squared: 0.1884, Adjusted R-squared: 0.

F-statistic: 61.62 on 2 and 531 DF, p-value: < 2.2e- 16