




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Stata commands for conducting multiple regression analysis in sociology. It includes statistical functions, descriptive statistics, correlation matrix, scatterplot matrix, and multiple regression output. The example uses data from the university of north carolina at chapel hill, sociology 708-001 course, and covers topics such as normal distribution functions, t-distribution functions, f-distribution functions, descriptive statistics, correlations, scatterplot matrix, and multiple regression.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





University of North Carolina Chapel Hill
Professor François Nielsen
followed by the name of the command in Stata.
See also the Stata and SAS Guide pdf (click on Documents in side bar; guide
is linked under Software Documentation).
The following statistical functions in Stata are useful for regression work. The
regression printout itself usually comprises all necessary statistics.
1.1 Normal Distribution Functions
curve to the left of z. (Compare with Table A.)
. display normal(1.207) .
the area under the standard normal curve to the left of z is p. (Compare with
Table A and Table D, bottom row.)
. display invnormal(0.975)
1.2 Student t Distribution Functions
. display ttail(7, 1.960) .
right of t is p. (Compare with Table D.)
. display invttail(7, 0.025)
1.3 F Distribution Functions
with Table E.)
. display Ftail(1, 14, 21.55) .
to the right of f is p. (Compare with Table E.)
. display invFtail(1, 14, .00038068)
I am using as an example the CSDATA from IPS6e (see Appendix D-2 for de-
scription). The units are 224 Computer Science majors at a large university. To
the data and copied them to the clipboard (Ctrl-C). Then in Stata I opened the
Data Editor (Data -> Data Editor) and pasted the data (Ctrl-V). Then I closed
desired with File -> Save ...) Then I listed the first 5 cases.
. list in 1/
+--------------------------------------------------+
| obs gpa hsm hss hse satm satv sex |
|---|
The response variable of interest is grade point average after three semesters
First I produced descriptive statistics for all the variables I intend to put in
. su hsm hss hse satm satv gpa
Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- hsm | 224 8.321429 1.638737 2 10 hss | 224 8.089286 1.699663 3 10 hse | 224 8.09375 1.507874 3 10 satm | 224 595.2857 86.40144 300 800 satv | 224 504.5491 92.61046 285 760 -------------+-------------------------------------------------------- gpa | 224 2.635223 .7793949 .12 4
. reg gpa hsm hss hse satm satv, level(99)
To obtain standardized coefficients (in place of the confidence intervals
. reg gpa hsm hss hse satm satv, beta
Source | SS df MS Number of obs = 224 -------------+------------------------------ F( 5, 218) = 11. Model | 28.6436439 5 5.72872878 Prob > F = 0. Residual | 106.819145 218 .489996078 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total | 135.462789 223 .607456452 Root MSE =.
gpa | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- hsm | .1459611 .039261 3.72 0.000. hss | .0359053 .0377984 0.95 0.343. hse | .0552926 .0395687 1.40 0.164. satm | .0009436 .0006857 1.38 0.170. satv | -.0004078 .0005919 -0.69 0.492 -. _cons | .3267187 .3999964 0.82 0..
ance inflation factors (VIF). These are measures of collinearity, the degree to
which each explanatory variable is associated with all the other explanatory
variables. A VIF above 10 is considered bothersome, but there is no VIF above
. vif
Variable | VIF 1/VIF -------------+---------------------- hsm | 1.88 0. hss | 1.88 0. hse | 1.62 0. satm | 1.60 0. satv | 1.37 0. -------------+---------------------- Mean VIF | 1.
ing variable names of my choice. Then to check the distribution of residuals
I draw a histogram of the residuals (shown in Figure 2) and a normal quan-
tile plot of the residuals (shown in Figure 3). The only fancy options I use is
square. Together with the straight line that Stata draws automatically the
square format shows deviations of the plot from linearity better than a rect-
angular plot (compare with IPS6e Figure 11.5 p.620). We can see the left-skew
in the distribution.
Figure 3: Normal quantile plot of residuals for the gpa regression (CSDATA).
for Stata; see IPS6e p.594 for formulas). I check the values for the first 5
observations. Note that the SE of forecast is always larger than the SE for
the mean response, as the SE of forecast contains individual variation in the
response variable in addition to uncertainty about the mean response.
. predict gpasepred, stdp . predict gpaseforecast, stdf . list gpapredict gpasepred gpaseforecast in 1/
+--------------------------------+
| gpapre~t gpasep~d gpasef~t |
|---|