Stata Handout for Module 3: Data Analysis with Stata, Study notes of Statistics

A handout for module 3 of a stata workshop, which covers various stata commands and techniques for data analysis. It includes instructions on creating histograms with density curves, calculating statistics, making scatterplots, and labeling points. The document also provides helpful tips for using stata and saving session logs.

Typology: Study notes

2011/2012

Uploaded on 12/29/2012

sankait
sankait 🇮🇳

4.2

(13)

113 documents

1 / 11

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Stata Hand-out for Module 3
Stata 10
Now that Stata 10 is on Citrix, I will use Stata 10 rather than
8 for these handouts. I hope that the use of Stata 8 in the
previous module did not confuse people.
Odum Workshop Handouts
Please refer to these handouts from Cathy Zimmer’s Odum
Institute workshop on Stata; this information will be useful to
you in Soc 708:
Odum Workshop Offered Again in November
The workshop will be offered again November 4-6 3:30-5:00 PM,
Manning Hall Room 01 (the computer lab at Odum Institute)
Registration is not required, bring a flash drive.
Addenda to Module 2:
(1) How to put a density curve on top of a histogram? Turns out
it’s very simple. For the example in Module 2, you would simply
include “kdensity” somewhere in the clause after the comma.
(Thanks Francois!)
This command will give you a histogram with the number of bins
Stata chooses automatically
.histogram infant, kdensity
This next command will give you a histogram like the one we did
before, with 14 bins, titles, and a different vertical axis
. histogram infant if infant~=., bin(14) frequency ytitle(Count)
xtitle(Infant Mortality Rate) title(Distribution of Infant
Mortality Rates) kdensity
(2) Doing calculations in Stata:
. display (2+2+3+4+10)/5
And Stata gives you this output: 4.2
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Stata Handout for Module 3: Data Analysis with Stata and more Study notes Statistics in PDF only on Docsity!

Stata Hand-out for Module 3

Stata 10

Now that Stata 10 is on Citrix, I will use Stata 10 rather than

8 for these handouts. I hope that the use of Stata 8 in the

previous module did not confuse people.

Odum Workshop Handouts

Please refer to these handouts from Cathy Zimmer’s Odum

Institute workshop on Stata; this information will be useful to

you in Soc 708:

Odum Workshop Offered Again in November

The workshop will be offered again November 4-6 3:30-5:00 PM,

Manning Hall Room 01 (the computer lab at Odum Institute)

Registration is not required, bring a flash drive.

Addenda to Module 2:

(1) How to put a density curve on top of a histogram? Turns out

it’s very simple. For the example in Module 2, you would simply

include “kdensity” somewhere in the clause after the comma.

(Thanks Francois!)

This command will give you a histogram with the number of bins

Stata chooses automatically

.histogram infant, kdensity

This next command will give you a histogram like the one we did

before, with 14 bins, titles, and a different vertical axis

. histogram infant if infant~=., bin(14) frequency ytitle(Count)

xtitle(Infant Mortality Rate) title(Distribution of Infant

Mortality Rates) kdensity

(2) Doing calculations in Stata:

. display (2+2+3+4+10)/

And Stata gives you this output: 4.

Helpful tips:

1. To find out more about any Stata command, use the help function within Stata. Go to

Help, Choose “Stata command”, and type “lfit” (lfit is a command). Or just type “help

lfit” into the command window.

2. To use a variable name in a command, you have a few options. You can type the whole

variable name, or:

a. just type part of variable name – if just a few characters can uniquely identify the

variable name, Stata will supply the rest of the variable.

b. click the variable as it appears in the variable window (or double-click, depending

on your set-up)

3. If you would like to re-do a command that you just did, you can press “page up” and the

previous command will appear. If you press “page up” again it will supply the command

before that, etc.

4. If you would like a record of what you did,

a. you can keep a log of your session, or part of a session. In Stata 10, go to “file”

and choose “log”, or click on the brown rectangle in the tool bar. Then it will ask

you where to save it and in what format.

b. You can save everything you have done during a Stata session. that is in the

“Review” window by clicking on the white square in the top left corner and

choosing “Save review contents”

c. You could use a ‘do-file’ – write all your commands in one lengthy text file and

then have Stata run them all at once. This is probably not useful to you for

homework, but is necessary for some

5. In an “if” clause, you are telling Stata to execute a command depending on the value of a

variable. Examples:

a. If you want Stata to only display points in a scatter plot if the value of variable

“age” equals 30, you would include this somewhere in the command: “if

age==30”. == is just two = signs so it’s like saying “if age equal equal 30”

b. When the variable you are using for your “if” clause is a “string” variable,

meaning it is not understood by Stata to be numerical, you put quotation marks

around the value of the variable. So, in the case of the scatterplot below where we

use an “if” clause for the variable “type”, you write “if type==”bc” ”

c. If you would like Stata to run the command to EXCLUDE observations with a

certain value, use “~” – “if age~=30” means the command applies to observations

where age is not equal to 30. You only have to type one = sign after the ~.

Here is the Stata command:

. twoway (scatter income education, mlabel(type))

prof

prof

prof prof prof

prof

prof

prof

profprof

prof

prof

prof (^) prof prof prof

prof

prof

prof

prof

prof

prof

prof

prof

prof

prof

prof bc

prof

prof

wc

prof

NA wc

wc wc

wc wc

wc wc wc wc

wc wc (^) wc

wcwc

wc wc

wc

wc

wc NA

bc

wc wc

wc

bc bc

bc

bc

bc

NA

NAbcbc bc bc

bc

bc

bcbc

bc

bc bc

bc bc bc bc bc bc bc

bc

bc

bc

bc

bc

bc bc

bc

bcbc bc

bc

bc

prof

bc

bc bc bc

bc

bc

0

5000

10000

15000

20000

25000

income

6 8 10 12 14 16 education

To make a graph that looks like the colorful one in R (lecture notes), I had to create 3 plots, and

use an “if” clause for each one, and choose a color and symbol for each. The “if” clause specifies

which value of TYPE will be displayed (e.g., if type==”bc”). (This procedure is more

complicated than the one in R! And if you want to superimpose lines, you would use the “lfit”

command on 3 additional plots!)

twoway (scatter income education if type=="bc", mcolor(red) msymbol(circle_hollow)) (scatter income education if type=="prof", mcolor(midgreen) msymbol(triangle_hollow)) (scatter income education if type=="wc", mcolor(blue) msymbol(plus)), legend(off)

0

5000

10000

15000

20000

25000

income

6 8 10 12 14 16 education

For the slide titled “Least Squares Regression”, you can refer

back to the scatterplot with line above. The Stata command is

“reg”. After “reg” (for regress) you list the y-variable

(a.k.a. dependent variable, response variable) first, then the

x-variable (independent variable, explanatory variable).

. reg prestige education

Source | SS df MS Number of obs = 102 -------------+------------------------------ F( 1, 100) = 260. Model | 21608.4361 1 21608.4361 Prob > F = 0. Residual | 8286.99 100 82.8699 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total | 29895.4261 101 295.994318 Root MSE = 9.


prestige | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- education | 5.360878 .3319882 16.15 0.000 4.702223 6. _cons | -10.73198 3.677088 -2.92 0.004 -18.02722 -3.


For the slide with 2 lines of different colors representing 2

different equations, I did not provide Stata commands because

there are no R commands provided.

Using the data provided by the CD that came with the book. Use

ta02006. I downloaded it as an Excel file, then cut and paste to

Stata. To do this, you open Stata and go to the editor window

(or just type “edit”) and you see empty rows and columns. Then

just paste what you had from Excel.

. twoway (scatter y1 x) (lfit y1 x)

4

6

8

10

12

(^4 6 8) x 10 12 14

y1 Fitted values

[To save time, don’t use the graphic interface to do all 4 of

these. Just hit “page up” and change the variable names while

retaining the commands.]

. twoway (scatter y2 x) (lfit y2 x)

2

4

6

8

10

4 6 8 10 12 14 x y2 Fitted values

. twoway (scatter y3 x) (lfit y3 x)

4

6

8

10

12

4 6 8 10 12 14 x y3 Fitted values

. twoway (scatter y4 x4) (lfit y4 x4)

Stata automatically made the lines two different colors. The red line excludes the outlier.

Here are the relevant regression equations. The first includes observation 12, and the second

excludes it using “if id~=12”.

. reg weight height

Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 1, 198) = 7. Model | 1630.88712 1 1630.88712 Prob > F = 0. Residual | 43713.1129 198 220.773297 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total | 45344 199 227.859296 Root MSE = 14.


weight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- height | .2384059 .0877159 2.72 0.007 .0654286. _cons | 25.26623 14.95042 1.69 0.093 -4.216263 54.


. reg weight height if id~=

Source | SS df MS Number of obs = 199 -------------+------------------------------ F( 1, 197) = 288. Model | 20941.4871 1 20941.4871 Prob > F = 0. Residual | 14312.0205 197 72.6498502 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total | 35253.5075 198 178.048018 Root MSE = 8.


weight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- height | 1.149222 .0676889 16.98 0.000 1.015734 1. _cons | -130.747 11.56271 -11.31 0.000 -153.5496 -107.


Extra Material:

As we saw with “histogram infant, kdensity”, you can add options after a command with a comma. Here is an option that may be used with the “reg” command: beta. “beta asks that standardized beta coefficients be reported instead of confidence intervals. The beta coefficients are the regression coefficients obtained by first standardizing all variables to have a mean of 0 and a standard deviation of 1. beta may not be specified with vce(cluster clustvar) or the svy prefix.” Here is how you use it, and what it does:

. reg prestige education, beta

Source | SS df MS Number of obs = 102 -------------+------------------------------ F( 1, 100) = 260. Model | 21608.4361 1 21608.4361 Prob > F = 0. Residual | 8286.99 100 82.8699 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total | 29895.4261 101 295.994318 Root MSE = 9.


prestige | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- education | 5.360878 .3319882 16.15 0.000. _cons | -10.73198 3.677088 -2.92 0..