


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An in-depth explanation of r-squared value and its limitations in evaluating the fit of a regression model. It also introduces the concept of adjusted r-squared and discusses how it differs from r-squared. The importance of considering the number of independent variables in a model and the relationship between r-squared and adjusted r-squared.
Typology: Lecture notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!



R-Squared Notes:
So far, we have not focused on the R-squared value to evaluate how “well” our model fits the data. Why? Because too much emphasis can be placed on this particular measure, and if you go on to study “time-series” data, you will see that the R-squared value can be extremely misleading.
Things to note:
S there is no value that R-squared should be for you to claim that your model does a good job at explaining the variation in the dependent variable. It is simply an estimate of how much variation can be explained.
S a small R-squared value implies that the error variance is large relative to the variance of y, which means that we may have a hard time precisely estimating the $ coefficients. BUT, this can be offset by a large sample size. This is true even if we have not controlled for many unobserved factors – which leads to the large error term. EXAMPLE: suppose that some incoming students at a large university are RANDOMLY given grants to buy computer equipment. If the amount of the grant is truly randomly determined, we can estimate the ceteris paribus effect of the grant amount on subsequent college grade point average by using simple regression analysis. Because of the random assignment, all of the other factors affecting GPA would be UNCORRELATED with the grant size. Now, it seems pretty unlikely that grant size would explain very much of the variation in GPA, so the R-squared from this simple regression would probably be pretty low, BUT we might still (with a large enough N) get a reasonably precise estimator for the effect on the grant. (NOTE: we don’t need to worry about omitted variable bias since all the omitted variables would be uncorrelated with the grant size!)
S The relative CHANGE in the R-squared value when variables are added to an equation provides A LOT OF USEFUL INFORMATION. This is related to the joint F-tests that we talked about earlier in testing joint restrictions.
R-squared and Adjusted R-squared Value: what happens when we add regressors to our equation.
S Recall that R-squared is the ratio between the explained SS/total SS, or:
Now, why is it helpful to write R-squared in this fashion? Think about the following: let Fy^2 be the population variance of y (unobserved by us) and F,^2 be the population variance on the random disturbance term (again, unobserved by us). Define the POPULATION R-squared to be:
which tells us the proportion of the variation of y in the population explained by the independent variables. But we don’t observe the population variances. So, we can use estimators for them:
Okay: so RSS/N is our ESTIMATOR for F,^2 and TSS/N is our estimator for Fy^2 in the “usual” R- squared. That is, the usual R-squared is an estimator for the POPULATION R-squared. BUT WE KNOW THAT BOTH OF THESE ESTIMATORS ARE BIASED (numerator and denominator). We can, instead use unbiased estimators for F,^2 and Fy2.^ In particular, we could use:
RSS/N-k-1 and TSS/N-1.
If we do this, we can get an ADJUSTED-R-squared value that is given by:
BUT: something to keep in mind is that the ratio of unbiased estimators DOES NOT LEAD TO AN UNBIASED ESTIMATOR. And, in fact, the adjusted R-squared estimator is not generally thought to be a better estimator for the population R-squared over the usual R-squared value.
(Recalling that our UNBIASED estimator for the variance on the error term is RSS/N-k-1.)
So, how does the adjusted and regular R-squared differ?
Okay: so, now why would we ever look at the adjusted R-squared value and not the R-squared value? Using the Adjusted R-squared to Choose Between Non-nested Models.
R-squared will ALWAYS go up if you add RHS variables. Why? Because the RSS can never go up when you add additional variables to your equation. And, if that’s so, looking at the R-squared alone and whether it goes up doesn’t tell you if you’ve got a “better” model.
trying to show how much of the variation in the LHS variable is explained by the data. But the Var(y) and the Var (lny) are going to be DIFFERENT. So, this just doesn’t make sense.