







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The concept of the F-test for joint hypotheses in the context of regression analysis. The F-test is used to determine if adding new regressors to a regression model significantly improves the fit of the model. formulas for calculating the F-statistic and discusses the distribution of the F-statistic under the null hypothesis. It also discusses the importance of the F-test in regression analysis and its historical significance.
Typology: Lecture notes
1 / 13
This page cannot be seen from the preview
Don't miss anything!








The null and alternative hypotheses can usually be interpreted as a restricted model ( ) and an unrestricted model ( ).
In our example:
Note that if the unrestricted model “fits” significantly better than the restricted model, we should reject the null.
The difference in “fit” between the model under the null and the model under the alternative leads us to an intuitive formulation of the F -test statistic, for testing joint hypotheses.
Recall that a measure of “fit” is the sum of squared residuals:
where
The F -test statistic may be written as:
2 2 2
unrestricted restricted unrestricted unrestricted
R R q R n k
where: 2 R restricted = the R (^2) for the restricted regression R unrestricted^2 = the R^2 for the unrestricted regression q = the number of restrictions under the null kunrestricted = the number of regressors in the unrestricted regression.
The bigger the difference between the restricted and unrestricted R^2 ’s – the greater the improvement in fit by adding the variables in question – the larger is the F statistic.
Note: the textbook differentiates between homoskedasticity only and heteroskedasticity robust F -tests. We will ignore heteroskedasticity for simplicity. Example : are the coefficients on strat and exppup zero?
Unrestricted population regression (under H A):
Restricted population regression (that is, under H 0 ):
The number of restrictions under H 0 is q = 2 ( why ?). The fit will be better ( R^2 will be higher) in the unrestricted regression ( why ?) By how much must the R^2 increase for the coefficients on strat and exppup to be judged statistically significant?
A commonly performed F -test is one which assesses whether the chosen model fits at all.
any one of the s not equal to zero
Why shouldn’t the intercept be restricted?
This test will be performed by most regression software, and reported as “ F - test” in the regression output – usually along with a p-value.
teachdata = read.csv("http://home.cc.umanitoba.ca/~godwinrt/3180/data/str.csv") attach(teachdata)
restricted = lm(score ~ eng) unrestricted = lm(score ~ strat + exppup + eng)
Let’s look at the full (unrestricted) model:
summary(unrestricted)
Now, to perform the F -test of whether school spending matters or not:
anova(unrestricted, restricted)
Output: Analysis of Variance Table
Model 1: score ~ strat + exppup + eng Model 2: score ~ eng Res.Df RSS Df Sum of Sq F Pr(>F) 1 416 85716 2 418 89014 - 2 - 3298.2 8.0034 0.0003885 ***
Signif. codes: 0 ‘’ 0.001 ‘’ 0.01 ‘’ 0.05 ‘.’ 0.1 ‘ ’ 1
The F distribution is tabulated many places
distribution:
distribution are essentially identical. Many regression packages (including STATA) compute p -values of F -statistics using the F distribution You will encounter the F distribution in published empirical work.
The “one at a time” approach of rejecting if either of the t - statistics exceeds 1.96 rejects more than 5% of the time under the null (the size exceeds the desired significance level)
The homoskedasticity-only F -statistic is important historically (and thus in practice), and can help intuition, but isn’t valid when there is heteroskedasticity