Multiple Regression Analysis: Sociology Handout, Study notes of Sociology

This handout demonstrates the concepts and relationships of multiple regression using a sociology example with 1985 cps data. It covers partial and simple regression coefficients, operational definition, extra sum of squares, and testing hypotheses.

Typology: Study notes

2011/2012

Uploaded on 11/20/2012

shubnam
shubnam 🇮🇳

4.5

(6)

127 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
sociology
multiple regression
This handout uses 1985 CPS data on hourly wage, years of schooling (x=edyrs) and
years of labor force experience (z=exper) to demonstrate some of the concepts and
relationships that go to the heart of multiple regression. Just to orient you,
here’s the relevant correlation matrix.
. correlate hrwage exper edyrs
(obs=515)
| hrwage exper edyrs
---------+---------------------------
hrwage | 1.0000
exper | 0.1299 1.0000
edyrs | 0.4000 -0.2889 1.0000
relationship between simple and partial regression coefficient
1. What is the relationship between the partial regression coefficient of edyrs
(.9827726) in y = f(edyrs,exper) and the simple regression coefficient of edyrs
(.823475) in y = f(edyrs )?
In general, we know that
so if x=edyrs and z=exper, then using regressions “a” and “e” below, we have
or
What this means is that failing to control for experience yields a smaller edyrs
coefficient than we would get with controls. If the model that includes edyrs
and exper is correct, then the coefficient from the simple regression of hrwage
on edyrs alone yields a biased estimate of the effect of years of schooling
slope. An estimate of the bias is given by the product
(.11011)(-1.4465)= -.15927
docsity.com
pf3
pf4
pf5
pf8

Partial preview of the text

Download Multiple Regression Analysis: Sociology Handout and more Study notes Sociology in PDF only on Docsity!

sociology multiple regression

This handout uses 1985 CPS data on hourly wage, years of schooling (x=edyrs) and years of labor force experience (z=exper) to demonstrate some of the concepts and relationships that go to the heart of multiple regression. Just to orient you, here’s the relevant correlation matrix.

. correlate hrwage exper edyrs (obs=515)

| hrwage exper edyrs ---------+--------------------------- hrwage | 1. exper | 0.1299 1. edyrs | 0.4000 -0.2889 1.

relationship between simple and partial regression coefficient

  1. What is the relationship between the partial regression coefficient of edyrs (.9827726) in y = f(edyrs,exper) and the simple regression coefficient of edyrs (.823475) in y = f(edyrs )?

In general, we know that

so if x=edyrs and z=exper, then using regressions “a” and “e” below, we have

or

What this means is that failing to control for experience yields a smaller edyrs coefficient than we would get with controls. If the model that includes edyrs and exper is correct, then the coefficient from the simple regression of hrwage on edyrs alone yields a biased estimate of the effect of years of schooling slope. An estimate of the bias is given by the product

a. regress hrwage edyrs exper

Source | SS df MS Number of obs = 515 ---------+------------------------------ F( 2, 512) = 74. Model | 2793.7797 2 1396.88985 Prob > F = 0. Residual | 9581.18331 512 18.7132487 R-squared = 0. ---------+------------------------------ Adj R-squared = 0. Total | 12374.963 514 24.0758035 Root MSE = 4.


hrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- edyrs | .9827726 .0836199 11.753 0.000 .8184922 1. exper | .1101188 .0166993 6.594 0.000 .0773111. _cons | -5.802635 1.234832 -4.699 0.000 -8.228595 -3.


b. regress hrwage edyrs

Source | SS df MS Number of obs = 515 ---------+------------------------------ F( 1, 513) = 97. Model | 1980.06338 1 1980.06338 Prob > F = 0. Residual | 10394.8996 513 20.2629622 R-squared = 0. ---------+------------------------------ Adj R-squared = 0. Total | 12374.963 514 24.0758035 Root MSE = 4.


hrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- edyrs | .823475 .0833033 9.885 0.000 .6598174. _cons | -1.774601 1.116715 -1.589 0.113 -3.968497.


c. regress hrwage exper

Source | SS df MS Number of obs = 515 ---------+------------------------------ F( 1, 513) = 8. Model | 208.925793 1 208.925793 Prob > F = 0. Residual | 12166.0372 513 23.7154722 R-squared = 0. ---------+------------------------------ Adj R-squared = 0. Total | 12374.963 514 24.0758035 Root MSE = 4.


hrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- exper | .0534191 .0179977 2.968 0.003 .0180609. _cons | 8.154298 .3810432 21.400 0.000 7.405701 8.


d. regress edyrs exper

Source | SS df MS Number of obs = 515 ---------+------------------------------ F( 1, 513) = 46. Model | 243.698749 1 243.698749 Prob > F = 0. Residual | 2676.27018 513 5.21690094 R-squared = 0. ---------+------------------------------ Adj R-squared = 0. Total | 2919.96893 514 5.68087341 Root MSE = 2.


edyrs | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- exper | -.0576936 .0084413 -6.835 0.000 -.0742772 -. _cons | 14.20159 .1787165 79.464 0.000 13.85048 14.


extra sum of squares

  1. the extra sum of squares attributable to edyrs in the multiple regression y = f(edyrs, exper) is, using regressions “a” and “c”:

ssr(edyrs|exper) = ssregy(edyr,exper) - ssregy(exper)

Note that 2584.8539 is just the sum of squares regression from regression “h” above, the regression of hrwge on the residualied edyrs variable, edyr_hat.

for exper, we have from “a” and “b”:

ssr(exper|edyrs)= 2793.7797 - 1980.0633 = 813.

multicollinearity

  1. how does the standard error of the partial regression coefficient for, say, edyrs in y = f(edyrs, exper), depend on the relationship of edyrs to exper.

The variance (and hence standard error) of the partial coefficient of edyrs will be equal to the mean square residual for the multiple regression y = f( edyrs, exper) divided by the sum of squares residual from the regression of edyrs on exper. Hence, using regressions “a” and “d” above, the standard error of edyrs in “a” is

.0836199 = sqrt(18.7132487/2676.27018)

note that this standard error of the edyrs coefficient in hrwage = f(edyrs,exper) is greater than the standard error of edyrs in hrwage = f(edyrs). The latter is .0833. This is the classic case of multicollinearity, in this case between exper and edyrs, increasing the standard error.

But also notice that this is not what happens with the standard error of exper, which goes from .01799 (“c” above) in hrwage = f(exper) to .0166 (“a” above) in

hrwage = f(edyrs, exper). What is going on here?

t-ratios and F-statistics

  1. What is the relationship between the t-ratio of a partial regression coefficient and the partial F-statistic for the test that the partial regression coefficient equals zero?

Note that the t-ratio for the coefficient of edyrs in regression “a”, t = 11.753, is the square root of ssr(edyrs|exper) divided by the mean square residual from regression “a”. That is, the t-ratio is the square root of the “partial” F- statistic.

11.753 = sqrt[(2793.7797-208.925793)/18.7132487] = sqrt[2584.85/18.7132]

hypothesis testing

  1. How can one test to determine if the partial regression coefficients of schooling and experience are equal, that is, that a year of schooling has same effect on mean hourly wage as a year of experience?

Regression “j” below represents the null hypothesis of equality by restricting the schooling and experience coefficients to be equal. In this instance, we accomplish this by forming a new variable, years.

i. genl years=edyrs+exper

j. regress hrwage years

Source | SS df MS Number of obs = 515 ---------+------------------------------ F( 1, 513) = 25. Model | 589.283865 1 589.283865 Prob > F = 0. Residual | 11785.6791 513 22.9740334 R-squared = 0. ---------+------------------------------ Adj R-squared = 0. Total | 12374.963 514 24.0758035 Root MSE = 4.


hrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- years | .0933062 .0184233 5.065 0.000 .0571119. _cons | 6.22555 .6035266 10.315 0.000 5.039862 7.


Now we carry out the F-test by comparing this restricted model to regression “a” above:

Another way to test this equality hypothesis in Stata is to issue the following command right after the regression y = f(edyrs, exper):

test exper=edyrs

( 1) - edyrs + exper = 0.

F( 1, 512) = 117.

Prob > F = 0.

  1. We know how to do a t-test of the hypothesis that, say, , where k is a

constant not equal to 0. How do you do an F-test of this hypothesis?

Let’s test the hypothesis that the schooling partial regression coefficient is equal to 1.5. The t-ratio is 6.19. Here’s the F-test:

. genl nu_wage=hrwage - 1.5*edyrs . regress nu_wage exper

Source | SS df MS Number of obs = 515 ---------+------------------------------ F( 1, 513) = 71. Model | 1434.17826 1 1434.17826 Prob > F = 0. Residual | 10297.1504 513 20.0724179 R-squared = 0. ---------+------------------------------ Adj R-squared = 0. Total | 11731.3287 514 22.8235966 Root MSE = 4.


nu_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- exper | .1399595 .0165577 8.453 0.000 .1074302. _cons | -13.14809 .3505566 -37.506 0.000 -13.83679 -12.


Now compute the F-statistic by comparing the residual sum of squares from this restricted model to the unrestricted model of regression “a”.

F = 38.

which is the square of t^2 = (6.19)^2 aside from rounding error.

Up until this last test, we could always construct the F-statistic by comparing either the residual sum of squares or the regression sum of squares from the null, restricted model and the alternative, unrestricted model. But that only works when the dependent variable of the fitted null and alternative models are exactly the same. Notice that in this last case, I created a new dependent variable to fit the alternative model. In instances like this, the test can only be done by comparing residual sums of squares from null and alternative. Comparing regression sums of squares to compute an F-statistic is wrong when the null and alternative models have different dependent variables.

  1. In the multiple regression of hrwage on edyrs and exper, how do we test the hypothesis that the coefficients of the independent variables are simultaneously

zero?

a. regress hrwage edyrs exper

Source | SS df MS Number of obs = 515 ---------+------------------------------ F( 2, 512) = 74. Model | 2793.7797 2 1396.88985 Prob > F = 0. Residual | 9581.18331 512 18.7132487 R-squared = 0. ---------+------------------------------ Adj R-squared = 0. Total | 12374.963 514 24.0758035 Root MSE = 4.


hrwage | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- edyrs | .9827726 .0836199 11.753 0.000 .8184922 1. exper | .1101188 .0166993 6.594 0.000 .0773111. _cons | -5.802635 1.234832 -4.699 0.000 -8.228595 -3.

From the output above, form the F-statistic obtained b y dividing the mean square regression by the mean

square residual:

F = 1396.88985/18.7132487 = 74.65 as given by the output above.