




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This handout demonstrates the concepts and relationships of multiple regression using a sociology example with 1985 cps data. It covers partial and simple regression coefficients, operational definition, extra sum of squares, and testing hypotheses.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





sociology multiple regression
This handout uses 1985 CPS data on hourly wage, years of schooling (x=edyrs) and years of labor force experience (z=exper) to demonstrate some of the concepts and relationships that go to the heart of multiple regression. Just to orient you, here’s the relevant correlation matrix.
. correlate hrwage exper edyrs (obs=515)
| hrwage exper edyrs ---------+--------------------------- hrwage | 1. exper | 0.1299 1. edyrs | 0.4000 -0.2889 1.
relationship between simple and partial regression coefficient
In general, we know that
so if x=edyrs and z=exper, then using regressions “a” and “e” below, we have
What this means is that failing to control for experience yields a smaller edyrs coefficient than we would get with controls. If the model that includes edyrs and exper is correct, then the coefficient from the simple regression of hrwage on edyrs alone yields a biased estimate of the effect of years of schooling slope. An estimate of the bias is given by the product
a. regress hrwage edyrs exper
Source | SS df MS Number of obs = 515 ---------+------------------------------ F( 2, 512) = 74. Model | 2793.7797 2 1396.88985 Prob > F = 0. Residual | 9581.18331 512 18.7132487 R-squared = 0. ---------+------------------------------ Adj R-squared = 0. Total | 12374.963 514 24.0758035 Root MSE = 4.
hrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- edyrs | .9827726 .0836199 11.753 0.000 .8184922 1. exper | .1101188 .0166993 6.594 0.000 .0773111. _cons | -5.802635 1.234832 -4.699 0.000 -8.228595 -3.
b. regress hrwage edyrs
Source | SS df MS Number of obs = 515 ---------+------------------------------ F( 1, 513) = 97. Model | 1980.06338 1 1980.06338 Prob > F = 0. Residual | 10394.8996 513 20.2629622 R-squared = 0. ---------+------------------------------ Adj R-squared = 0. Total | 12374.963 514 24.0758035 Root MSE = 4.
hrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- edyrs | .823475 .0833033 9.885 0.000 .6598174. _cons | -1.774601 1.116715 -1.589 0.113 -3.968497.
c. regress hrwage exper
Source | SS df MS Number of obs = 515 ---------+------------------------------ F( 1, 513) = 8. Model | 208.925793 1 208.925793 Prob > F = 0. Residual | 12166.0372 513 23.7154722 R-squared = 0. ---------+------------------------------ Adj R-squared = 0. Total | 12374.963 514 24.0758035 Root MSE = 4.
hrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- exper | .0534191 .0179977 2.968 0.003 .0180609. _cons | 8.154298 .3810432 21.400 0.000 7.405701 8.
d. regress edyrs exper
Source | SS df MS Number of obs = 515 ---------+------------------------------ F( 1, 513) = 46. Model | 243.698749 1 243.698749 Prob > F = 0. Residual | 2676.27018 513 5.21690094 R-squared = 0. ---------+------------------------------ Adj R-squared = 0. Total | 2919.96893 514 5.68087341 Root MSE = 2.
edyrs | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- exper | -.0576936 .0084413 -6.835 0.000 -.0742772 -. _cons | 14.20159 .1787165 79.464 0.000 13.85048 14.
extra sum of squares
Note that 2584.8539 is just the sum of squares regression from regression “h” above, the regression of hrwge on the residualied edyrs variable, edyr_hat.
for exper, we have from “a” and “b”:
ssr(exper|edyrs)= 2793.7797 - 1980.0633 = 813.
multicollinearity
The variance (and hence standard error) of the partial coefficient of edyrs will be equal to the mean square residual for the multiple regression y = f( edyrs, exper) divided by the sum of squares residual from the regression of edyrs on exper. Hence, using regressions “a” and “d” above, the standard error of edyrs in “a” is
.0836199 = sqrt(18.7132487/2676.27018)
note that this standard error of the edyrs coefficient in hrwage = f(edyrs,exper) is greater than the standard error of edyrs in hrwage = f(edyrs). The latter is .0833. This is the classic case of multicollinearity, in this case between exper and edyrs, increasing the standard error.
But also notice that this is not what happens with the standard error of exper, which goes from .01799 (“c” above) in hrwage = f(exper) to .0166 (“a” above) in
hrwage = f(edyrs, exper). What is going on here?
t-ratios and F-statistics
Note that the t-ratio for the coefficient of edyrs in regression “a”, t = 11.753, is the square root of ssr(edyrs|exper) divided by the mean square residual from regression “a”. That is, the t-ratio is the square root of the “partial” F- statistic.
11.753 = sqrt[(2793.7797-208.925793)/18.7132487] = sqrt[2584.85/18.7132]
hypothesis testing
Regression “j” below represents the null hypothesis of equality by restricting the schooling and experience coefficients to be equal. In this instance, we accomplish this by forming a new variable, years.
i. genl years=edyrs+exper
j. regress hrwage years
Source | SS df MS Number of obs = 515 ---------+------------------------------ F( 1, 513) = 25. Model | 589.283865 1 589.283865 Prob > F = 0. Residual | 11785.6791 513 22.9740334 R-squared = 0. ---------+------------------------------ Adj R-squared = 0. Total | 12374.963 514 24.0758035 Root MSE = 4.
hrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- years | .0933062 .0184233 5.065 0.000 .0571119. _cons | 6.22555 .6035266 10.315 0.000 5.039862 7.
Now we carry out the F-test by comparing this restricted model to regression “a” above:
Another way to test this equality hypothesis in Stata is to issue the following command right after the regression y = f(edyrs, exper):
test exper=edyrs
( 1) - edyrs + exper = 0.
Prob > F = 0.
constant not equal to 0. How do you do an F-test of this hypothesis?
Let’s test the hypothesis that the schooling partial regression coefficient is equal to 1.5. The t-ratio is 6.19. Here’s the F-test:
. genl nu_wage=hrwage - 1.5*edyrs . regress nu_wage exper
Source | SS df MS Number of obs = 515 ---------+------------------------------ F( 1, 513) = 71. Model | 1434.17826 1 1434.17826 Prob > F = 0. Residual | 10297.1504 513 20.0724179 R-squared = 0. ---------+------------------------------ Adj R-squared = 0. Total | 11731.3287 514 22.8235966 Root MSE = 4.
nu_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- exper | .1399595 .0165577 8.453 0.000 .1074302. _cons | -13.14809 .3505566 -37.506 0.000 -13.83679 -12.
Now compute the F-statistic by comparing the residual sum of squares from this restricted model to the unrestricted model of regression “a”.
which is the square of t^2 = (6.19)^2 aside from rounding error.
Up until this last test, we could always construct the F-statistic by comparing either the residual sum of squares or the regression sum of squares from the null, restricted model and the alternative, unrestricted model. But that only works when the dependent variable of the fitted null and alternative models are exactly the same. Notice that in this last case, I created a new dependent variable to fit the alternative model. In instances like this, the test can only be done by comparing residual sums of squares from null and alternative. Comparing regression sums of squares to compute an F-statistic is wrong when the null and alternative models have different dependent variables.
zero?
a. regress hrwage edyrs exper
Source | SS df MS Number of obs = 515 ---------+------------------------------ F( 2, 512) = 74. Model | 2793.7797 2 1396.88985 Prob > F = 0. Residual | 9581.18331 512 18.7132487 R-squared = 0. ---------+------------------------------ Adj R-squared = 0. Total | 12374.963 514 24.0758035 Root MSE = 4.
hrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- edyrs | .9827726 .0836199 11.753 0.000 .8184922 1. exper | .1101188 .0166993 6.594 0.000 .0773111. _cons | -5.802635 1.234832 -4.699 0.000 -8.228595 -3.