




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Stata commands for module 10 of the statistics for sociologists course at the university of north carolina, chapel hill, taught by professor françois nielsen. It covers statistical functions useful for regression work, including normal distribution functions, student t distribution functions, and f distribution functions. The document also includes examples of simple regression with direct data input and data input from a spreadsheet, as well as instructions for creating graphs and conducting residual analysis.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





University of North Carolina Chapel Hill
Professor François Nielsen
followed by the name of the command in Stata.
See also the Stata and SAS Guide pdf (click on Documents in side bar; guide
is linked under Software Documentation).
The following statistical functions in Stata are useful for regression work. The
regression printout itself usually comprises all necessary statistics.
1.1 Normal Distribution Functions
curve to the left of z. (Compare with Table A.)
. display normal(1.207) .
the area under the standard normal curve to the left of z is p. (Compare with
Table A and Table D (bottom row).)
. display invnormal(0.975)
1.2 Student t Distribution Functions
. display ttail(7, 1.960) .
the right of t is p. (Compare with Table D.)
. display invttail(7, 0.025)
1.3 F Distribution Functions
with Table E.)
. display Ftail(1, 14, 21.55) .
freedom to the right of f is p. (Compare with Table D.)
. display invFtail(1, 14, .00038068)
. input x
x
y
Source | SS df MS Number of obs = 7 -------------+------------------------------ F( 1, 5) = 5. Model | 51843318 1 51843318 Prob > F = 0. Residual | 43870967.7 5 8774193.55 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total | 95714285.7 6 15952381 Root MSE = 2962.
y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 2419.355 995.3064 2.43 0.059 -139.1616 4977.
We continue with the previous example to show some useful commands.
To check linearity of the regression we may want to do a lowess plot of the
of nonparametric regression used to show the main trend in the data.)
. lowess fat nea
The lowess plot is shown in Figure 1. The linear appearance of the lowess
To make a scatterplot with the regression line, use the following command.
When the graph appear you can save it in a variety of formats from the graph
window. This produces the graph shown in Figure 2.
. twoway (scatter fat nea) (lfit fat nea)
on the y-axis. First we repeat the regression command (which is not necessary
if you have already run it). Then we get to the residuals by creating a variable
Figure 3.
. reg fat nea . predict fatresid, residuals
Figure 2: Scatterplot of fat gain by nonexercise activity with fitted regression
line.
. predict fatpredict, xb . twoway (scatter fatresid fatpredict), yline(0)
We further analysze the distribution of residuals by creating a histogram
(with superposed kernel density) (Figure 4) and a normal quantile plot (Fig-
ure 5). We note that except for one rather large positive residual the residuals
in this case appear fairly normal.
. histogram fatresid, kdensity (bin=4, start=-1.1090604, width=.68825009) . qnorm fatresid
Finally another useful plot is the linear prediction plot with confidence lim-
its for the mean response. This is obtained with the following command. Make
the data points. The plot is shown in Figure 6.
. twoway (lfitci fat nea) (scatter fat nea)
Finally the last command shows the prediction plot with both the 95% con-
pare with IPS6e Figure 10.9 p.573) and the the 95% confidence limits for in-
The latter confidence includes uncertainty due to prediction of the mean and
uncertainty due to prediction of individual responses and is useful to detect
outliers. The plot is shown in Figure 7.
. twoway (lfitci fat nea) (lfitci fat nea, stdf ciplot(rline)) (scatter fat nea)
the mean response.
confidence band).