F-Test for Joint Hypotheses: Testing the Significance of Additional Regressors, Lecture notes of Statistics

The concept of the F-test for joint hypotheses in the context of regression analysis. The F-test is used to determine if adding new regressors to a regression model significantly improves the fit of the model. formulas for calculating the F-statistic and discusses the distribution of the F-statistic under the null hypothesis. It also discusses the importance of the F-test in regression analysis and its historical significance.

Typology: Lecture notes

2021/2022

Uploaded on 08/05/2022

char_s67
char_s67 🇱🇺

4.5

(116)

1.9K documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Joint hypotheses
The null and alternative hypotheses can usually be interpreted as a
restricted model ( ) and an unrestricted model ( ).
In our example:
Note that if the unrestricted model “fits” significantly better than the
restricted model, we should reject the null.
The difference in “fit” between the model under the null and the
model under the alternative leads us to an intuitive formulation of
the F-test statistic, for testing joint hypotheses.
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download F-Test for Joint Hypotheses: Testing the Significance of Additional Regressors and more Lecture notes Statistics in PDF only on Docsity!

Joint hypotheses

The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an unrestricted model ( ).

In our example:

Note that if the unrestricted model “fits” significantly better than the restricted model, we should reject the null.

The difference in “fit” between the model under the null and the model under the alternative leads us to an intuitive formulation of the F -test statistic, for testing joint hypotheses.

Recall that a measure of “fit” is the sum of squared residuals:

where

The F -test statistic may be written as:

F =

2 2 2

unrestricted restricted unrestricted unrestricted

R R q R n k

where: 2 R restricted = the R (^2) for the restricted regression R unrestricted^2 = the R^2 for the unrestricted regression q = the number of restrictions under the null kunrestricted = the number of regressors in the unrestricted regression.

The bigger the difference between the restricted and unrestricted R^2 ’s – the greater the improvement in fit by adding the variables in question – the larger is the F statistic.

Note: the textbook differentiates between homoskedasticity only and heteroskedasticity robust F -tests. We will ignore heteroskedasticity for simplicity. Example : are the coefficients on strat and exppup zero?

Unrestricted population regression (under H A):

scorei =  0 +  1 strati +  2 exppupi +  3 engi + ui

Restricted population regression (that is, under H 0 ):

TestScorei =  0 +  3 engi + ui ( why ?)

 The number of restrictions under H 0 is q = 2 ( why ?).  The fit will be better ( R^2 will be higher) in the unrestricted regression ( why ?) By how much must the R^2 increase for the coefficients on strat and exppup to be judged statistically significant?

A commonly performed F -test is one which assesses whether the chosen model fits at all.

any one of the s not equal to zero

Why shouldn’t the intercept be restricted?

This test will be performed by most regression software, and reported as “ F - test” in the regression output – usually along with a p-value.

R Code

teachdata = read.csv("http://home.cc.umanitoba.ca/~godwinrt/3180/data/str.csv") attach(teachdata)

restricted = lm(score ~ eng) unrestricted = lm(score ~ strat + exppup + eng)

Let’s look at the full (unrestricted) model:

summary(unrestricted)

Now, to perform the F -test of whether school spending matters or not:

anova(unrestricted, restricted)

Output: Analysis of Variance Table

Model 1: score ~ strat + exppup + eng Model 2: score ~ eng Res.Df RSS Df Sum of Sq F Pr(>F) 1 416 85716 2 418 89014 - 2 - 3298.2 8.0034 0.0003885 ***


Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1

The Fq,n–k– 1 distribution:

 The F distribution is tabulated many places

 As n  , the Fq,n-k– 1 distribution asymptotes to the  q^2 / q

distribution:

The Fq ,  and  q^2 / q distributions are the same.

 For q not too big and n ≥100, the Fq , n–k– 1 distribution and the  q^2 / q

distribution are essentially identical.  Many regression packages (including STATA) compute p -values of F -statistics using the F distribution You will encounter the F distribution in published empirical work.

 The “one at a time” approach of rejecting if either of the t - statistics exceeds 1.96 rejects more than 5% of the time under the null (the size exceeds the desired significance level)

 For n large, the F -statistic is distributed  q^2 / q (= Fq ,)

 The homoskedasticity-only F -statistic is important historically (and thus in practice), and can help intuition, but isn’t valid when there is heteroskedasticity