Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Stata Commands for Multiple Regression in Sociology: UNC Chapel Hill, Soci708-001, Study notes of Statistics

Alliance University Statistics

Stata commands for conducting multiple regression analysis in sociology. It includes statistical functions, descriptive statistics, correlation matrix, scatterplot matrix, and multiple regression output. The example uses data from the university of north carolina at chapel hill, sociology 708-001 course, and covers topics such as normal distribution functions, t-distribution functions, f-distribution functions, descriptive statistics, correlations, scatterplot matrix, and multiple regression.

Typology: Study notes

2011/2012

Uploaded on 12/29/2012

sankait 🇮🇳

4.2

(13)

113 documents

1 / 8

This page cannot be seen from the preview

Don't miss anything!

University of North Carolina

Chapel Hill

Soci708-001 Statistics for Sociologists

Fall 2009

Professor François Nielsen

Stata Commands for Module 11 – Multiple Regression

For further information on any command in this handout, simply type help

followed by the name of the command in Stata.

See also the Stata and SAS Guide pdf (click on Documents in side bar; guide

is linked under Software Documentation).

1 Statistical Functions in Stata

The following statistical functions in Stata are useful for regression work. The

regression printout itself usually comprises all necessary statistics.

1.1 Normal Distribution Functions

The function normal(z) returns P(Z≤z), the area under the standard normal

curve to the left of z. (Compare with Table A.)

. display normal(1.207)

.88628393

The function invnormal(p) returns zsuch that P(Z≤z) = p, i.e. zsuch that

the area under the standard normal curve to the left of zis p. (Compare with

Table A and Table D, bottom row.)

. display invnormal(0.975)

1.959964

1.2 Student tDistribution Functions

The function ttail(df, t) returns P(T>t), the area under the Student’s t

distribution with df degrees of freedom to the right of t. (Compare with Table

D.)

. display ttail(7, 1.960)

.04540985

The function invttail(df, p) returns tsuch that P(T>t) = p, i.e. tsuch

that the area under Student’s t distribution with df degrees of freedom to the

right of tis p. (Compare with Table D.)

. display invttail(7, 0.025)

2.3646243

1

Discover Study notes of Statistics Alliance University

Partial preview of the text

Download Stata Commands for Multiple Regression in Sociology: UNC Chapel Hill, Soci708-001 and more Study notes Statistics in PDF only on Docsity!

University of North Carolina Chapel Hill

Soci708-001 Statistics for Sociologists

Fall 2009

Professor François Nielsen

Stata Commands for Module 11 – Multiple Regression

For further information on any command in this handout, simply type help

followed by the name of the command in Stata.

See also the Stata and SAS Guide pdf (click on Documents in side bar; guide

is linked under Software Documentation).

1 Statistical Functions in Stata

The following statistical functions in Stata are useful for regression work. The

regression printout itself usually comprises all necessary statistics.

1.1 Normal Distribution Functions

The function normal(z) returns P ( Z ≤ z ), the area under the standard normal

curve to the left of z. (Compare with Table A.)

. display normal(1.207) .

The function invnormal(p) returns z such that P ( Z ≤ z ) = p , i.e. z such that

the area under the standard normal curve to the left of z is p. (Compare with

Table A and Table D, bottom row.)

. display invnormal(0.975)

1.2 Student t Distribution Functions

The function ttail(df, t) returns P ( T > t ), the area under the Student’s t

distribution with df degrees of freedom to the right of t. (Compare with Table

D.)

. display ttail(7, 1.960) .

The function invttail(df, p) returns t such that P ( T > t ) = p , i.e. t such

that the area under Student’s t distribution with df degrees of freedom to the

right of t is p. (Compare with Table D.)

. display invttail(7, 0.025)

1.3 F Distribution Functions

The function Ftail(n1, n2, f) returns P ( F > f ), the area under the F

distribution with n1 and n2 degrees of freedom to the right of f. (Compare

with Table E.)

. display Ftail(1, 14, 21.55) .

The function invFtail(n1, n2, p) returns f such that P ( F > f ) = p , i.e. f

such that the area under the F distribution with n1 and n2 degrees of freedom

to the right of f is p. (Compare with Table E.)

. display invFtail(1, 14, .00038068)

2 Descriptive Statistics, Correlations, and Scatterplot Matrix

I am using as an example the CSDATA from IPS6e (see Appendix D-2 for de-

scription). The units are 224 Computer Science majors at a large university. To

enter the data in Stata I retrieved the csdata.xls file in the CD-ROM, selected

the data and copied them to the clipboard (Ctrl-C). Then in Stata I opened the

Data Editor (Data -> Data Editor) and pasted the data (Ctrl-V). Then I closed

the Data Editor by clicking on ×. (You can save the data as a *.dta file if

desired with File -> Save ...) Then I listed the first 5 cases.

. list in 1/

+--------------------------------------------------+

obs gpa hsm hss hse satm satv sex

| 1 3.32 10 10 10 670 600 1 |
| 2 2.26 6 8 5 700 640 1 |
| 3 2.35 8 6 8 640 530 1 |
| 4 2.08 9 10 7 670 600 1 |
| 5 3.38 8 9 8 540 580 1 | +--------------------------------------------------+

The response variable of interest is grade point average after three semesters

(gpa). The explanatory variables are high school grades in mathematics (hsm),

science (hss) and English or language arts (hse); SAT score in math (satm)

and verbal (satv); and sex (1=male, 2=female).

First I produced descriptive statistics for all the variables I intend to put in

the regression with the command su (for summarize).

. su hsm hss hse satm satv gpa

Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- hsm | 224 8.321429 1.638737 2 10 hss | 224 8.089286 1.699663 3 10 hse | 224 8.09375 1.507874 3 10 satm | 224 595.2857 86.40144 300 800 satv | 224 504.5491 92.61046 285 760 -------------+-------------------------------------------------------- gpa | 224 2.635223 .7793949 .12 4

Figure 1: Scatterplot matrix of variables for the gpa regression (CSDATA).

Produced with Stata command graph matrix hsm hss hse satm satv

gpa, half xsize(4) ysize(4)

. reg gpa hsm hss hse satm satv, level(99)

To obtain standardized coefficients (in place of the confidence intervals

shown by default) use the option beta.

. reg gpa hsm hss hse satm satv, beta

Source | SS df MS Number of obs = 224 -------------+------------------------------ F( 5, 218) = 11. Model | 28.6436439 5 5.72872878 Prob > F = 0. Residual | 106.819145 218 .489996078 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total | 135.462789 223 .607456452 Root MSE =.

gpa | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- hsm | .1459611 .039261 3.72 0.000. hss | .0359053 .0377984 0.95 0.343. hse | .0552926 .0395687 1.40 0.164. satm | .0009436 .0006857 1.38 0.170. satv | -.0004078 .0005919 -0.69 0.492 -. _cons | .3267187 .3999964 0.82 0..

Using the command vif after running a regression will calculate the vari-

ance inflation factors (VIF). These are measures of collinearity, the degree to

which each explanatory variable is associated with all the other explanatory

variables. A VIF above 10 is considered bothersome, but there is no VIF above

10 in this particular example. (We are not going to use vif in this class.)

. vif

4 Analysis of Residuals

To calculate the predicted values of gpa and the gpa residuals I can use the

command predict, with the option xb and residuals, respectively, assign-

ing variable names of my choice. Then to check the distribution of residuals

I draw a histogram of the residuals (shown in Figure 2) and a normal quan-

tile plot of the residuals (shown in Figure 3). The only fancy options I use is

xsize(3.5) ysize(3.5) with the normal quantile plot, to make the plot

square. Together with the straight line that Stata draws automatically the

square format shows deviations of the plot from linearity better than a rect-

angular plot (compare with IPS6e Figure 11.5 p.620). We can see the left-skew

in the distribution.

Figure 3: Normal quantile plot of residuals for the gpa regression (CSDATA).

Produced by Stata command qnorm gparesid, xsize(3.5) ysize(3.5)

Figure 4: Residual plot for the gpa regression (CSDATA). Produced by Stata

command twoway (scatter gparesid gpapredict), yline(0)

for Stata; see IPS6e p.594 for formulas). I check the values for the first 5

observations. Note that the SE of forecast is always larger than the SE for

the mean response, as the SE of forecast contains individual variation in the

response variable in addition to uncertainty about the mean response.

. predict gpasepred, stdp . predict gpaseforecast, stdf . list gpapredict gpasepred gpaseforecast in 1/

+--------------------------------+

gpapre~t gpasep~d gpasef~t

| 3.085806 .0871256 .7053984 |
| 2.165682 .1740038 .7212998 |
| 2.539919 .0932556 .7061818 |
| 2.773967 .1156441 .7094855 |
| 2.532883 .0898973 .7057461 | +--------------------------------+

Stata Commands for Multiple Regression in Sociology: UNC Chapel Hill, Soci708-001, Study notes of Statistics

Related documents

Partial preview of the text

Download Stata Commands for Multiple Regression in Sociology: UNC Chapel Hill, Soci708-001 and more Study notes Statistics in PDF only on Docsity!

Soci708-001 Statistics for Sociologists

Fall 2009

Stata Commands for Module 11 – Multiple Regression

For further information on any command in this handout, simply type help

1 Statistical Functions in Stata

The function normal(z) returns P ( Z ≤ z ), the area under the standard normal

The function invnormal(p) returns z such that P ( Z ≤ z ) = p , i.e. z such that

The function ttail(df, t) returns P ( T > t ), the area under the Student’s t

distribution with df degrees of freedom to the right of t. (Compare with Table

D.)

The function invttail(df, p) returns t such that P ( T > t ) = p , i.e. t such

that the area under Student’s t distribution with df degrees of freedom to the

The function Ftail(n1, n2, f) returns P ( F > f ), the area under the F

distribution with n1 and n2 degrees of freedom to the right of f. (Compare

The function invFtail(n1, n2, p) returns f such that P ( F > f ) = p , i.e. f

such that the area under the F distribution with n1 and n2 degrees of freedom

2 Descriptive Statistics, Correlations, and Scatterplot Matrix

enter the data in Stata I retrieved the csdata.xls file in the CD-ROM, selected

the Data Editor by clicking on ×. (You can save the data as a *.dta file if

(gpa). The explanatory variables are high school grades in mathematics (hsm),

science (hss) and English or language arts (hse); SAT score in math (satm)

and verbal (satv); and sex (1=male, 2=female).

the regression with the command su (for summarize).

Figure 1: Scatterplot matrix of variables for the gpa regression (CSDATA).

Produced with Stata command graph matrix hsm hss hse satm satv

gpa, half xsize(4) ysize(4)

shown by default) use the option beta.

Using the command vif after running a regression will calculate the vari-

10 in this particular example. (We are not going to use vif in this class.)

4 Analysis of Residuals

To calculate the predicted values of gpa and the gpa residuals I can use the

command predict, with the option xb and residuals, respectively, assign-

xsize(3.5) ysize(3.5) with the normal quantile plot, to make the plot

Produced by Stata command qnorm gparesid, xsize(3.5) ysize(3.5)

Figure 4: Residual plot for the gpa regression (CSDATA). Produced by Stata

command twoway (scatter gparesid gpapredict), yline(0)