


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A sociology exercise on multinomial logit regression analysis using the 1991 general social survey data. The exercise involves testing hypotheses on the relationship between various independent variables, including years of schooling, age, sex, rural upbringing, and region dummies, and a categorical dependent variable representing occupations. The analysis includes fitting the null model, testing individual coefficients, and testing the effect of subsets of coefficients.
Typology: Study notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!



Sociology
multinomial logit
testing hypotheses
The data for this exercise again comes from the 1991 General Social Survey. The categorical dependent variable occ is coded as follows:
occ=0 if a workers occupation is laborer, operative or craft; occ=1 if occupation is clerical, sales, or service; occ=2 if occupation is managerial, technical, or professional.
The independent variables are: educ is years of schooling; age is age in years; sexx is coded 1 male, 0 female; rural is coded 1 if grew up in rural area, 0 otherwise; mid and wst are dummy variables for region, with other parts of the country omitted.
Let’s fit what we’ll treat for most of this exercise as the null model.
3. mlogit occ educ age sexx rural mid wst,base(0)
Multinomial regression Number of obs = 633 LR chi2(12) = 353. Prob > chi2 = 0. Log likelihood = -511.92941 Pseudo R2 = 0.
occ | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- 1 | educ | .2490034 .056606 4.399 0.000 .1380577. age | .0156041 .0099216 1.573 0.116 -.0038418. sexx | -2.028054 .2392113 -8.478 0.000 -2.4969 -1. rural | -.7635868 .2619814 -2.915 0.004 -1.277061 -. mid | .4081406 .2761675 1.478 0.139 -.1331378. wst | .4151271 .3078639 1.348 0.178 -.188275 1. _cons | -2.253103 .853224 -2.641 0.008 -3.925391 -. ---------+-------------------------------------------------------------------- 2 | educ | .7840261 .0684775 11.449 0.000 .6498126. age | .01764 .011552 1.527 0.127 -.0050015. sexx | -1.680553 .2778157 -6.049 0.000 -2.225062 -1. rural | -.128399 .2965349 -0.433 0.665 -.7095968. mid | .144635 .3137103 0.461 0.645 -.4702258. wst | .3873871 .3445527 1.124 0.261 -.2879237 1. _cons | -10.27188 1.063177 -9.661 0.000 -12.35567 -8.
Tests of individual coefficients
You can use the z-scores to test for individual coefficients in separate equations. To do a test of ALL the coefficients of a given variable, say educ, in all the equations, you need to impose the constraint of the null hypothesis,
and then estimate the restricted model:
4. mlogit occ age sexx rural mid wst,base(0)
Multinomial regression Number of obs = 633 LR chi2(10) = 112. Prob > chi2 = 0. Log likelihood = -632.35198 Pseudo R2 = 0.
occ | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- 1 | age | .0139588 .0095435 1.463 0.144 -.0047461. sexx | -1.941273 .2293351 -8.465 0.000 -2.390762 -1. rural | -.9540048 .2512577 -3.797 0.000 -1.446461 -. mid | .5754085 .2641193 2.179 0.029 .0577443 1. wst | .476824 .2938117 1.623 0.105 -.0990362 1. _cons | .8674283 .418339 2.074 0.038 .0474989 1. ---------+-------------------------------------------------------------------- 2 | age | .0103217 .0094321 1.094 0.274 -.0081648. sexx | -1.213865 .2267656 -5.353 0.000 -1.658317 -. rural | -.7604203 .2414063 -3.150 0.002 -1.233568 -. mid | .4151938 .2612787 1.589 0.112 -.096903. wst | .4373206 .2893423 1.511 0.131 -.1297799 1. _cons | .6000712 .4171453 1.439 0.150 -.2175186 1.
(Outcome occ==0 is the comparison group) The likelihood ratio test statistic is then
This is distributed as a chi-square with 2 degrees of freedom. Since the mean of a chisquare with df=2 is 2, 242 is way into any reasonable critical region.
Another way to do this same test (i.e., that all the educ coefficients are zero) is with a Wald statistic produced by Stata’s test command. After fitting the full alternative model with command 3 above, issue the following command:
5. test educ
( 1) [1]educ = 0. ( 2) [2]educ = 0.
chi2( 2) = 146. Prob > chi2 = 0.
Multinomial regression Number of obs = 633 LR chi2(6) = 223. Prob > chi2 = 0. Log likelihood = -576.59741 Pseudo R2 = 0.
occ | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- 1 | educ | (dropped) age | (dropped) sexx | (dropped) rural | (dropped) mid | (dropped) wst | (dropped) _cons | .3659343 .0992281 3.688 0.000 .1714508. ---------+-------------------------------------------------------------------- 2 | educ | .6130706 .0520059 11.788 0.000 .5111409. age | .0076156 .0090492 0.842 0.400 -.0101205. sexx | -.2976082 .210549 -1.413 0.158 -.7102768. rural | .331173 .2502102 1.324 0.186 -.1592299. mid | -.1160987 .2463791 -0.471 0.637 -.5989928. wst | .1039835 .2646088 0.393 0.694 -.4146401. _cons | -8.609597 .8531524 -10.092 0.000 -10.28174 -6.
(Outcome occ==0 is the comparison group)
The LR statistic is 2(576.5974-511.92941)= 129.
Here’s another way to do this test for all possible pairs of equations. Fit the full model, and then issue this command.
. mlogtest , lrcomb
**** LR tests for combining outcome categories
Ho: All coefficients except intercepts associated with given pair of outcomes are 0 (i.e., categories can be collapsed).
Categories tested | chi2 df P>chi ------------------+------------------------ 1- 2 | 153.790 6 0. 1- 0 | 129.336 6 0. 2- 0 | 259.746 6 0.