Download Stata Handout for Module 3: Data Analysis with Stata and more Study notes Statistics in PDF only on Docsity!
Stata Hand-out for Module 3
Stata 10
Now that Stata 10 is on Citrix, I will use Stata 10 rather than
8 for these handouts. I hope that the use of Stata 8 in the
previous module did not confuse people.
Odum Workshop Handouts
Please refer to these handouts from Cathy Zimmer’s Odum
Institute workshop on Stata; this information will be useful to
you in Soc 708:
Odum Workshop Offered Again in November
The workshop will be offered again November 4-6 3:30-5:00 PM,
Manning Hall Room 01 (the computer lab at Odum Institute)
Registration is not required, bring a flash drive.
Addenda to Module 2:
(1) How to put a density curve on top of a histogram? Turns out
it’s very simple. For the example in Module 2, you would simply
include “kdensity” somewhere in the clause after the comma.
(Thanks Francois!)
This command will give you a histogram with the number of bins
Stata chooses automatically
.histogram infant, kdensity
This next command will give you a histogram like the one we did
before, with 14 bins, titles, and a different vertical axis
. histogram infant if infant~=., bin(14) frequency ytitle(Count)
xtitle(Infant Mortality Rate) title(Distribution of Infant
Mortality Rates) kdensity
(2) Doing calculations in Stata:
. display (2+2+3+4+10)/
And Stata gives you this output: 4.
Helpful tips:
1. To find out more about any Stata command, use the help function within Stata. Go to
Help, Choose “Stata command”, and type “lfit” (lfit is a command). Or just type “help
lfit” into the command window.
2. To use a variable name in a command, you have a few options. You can type the whole
variable name, or:
a. just type part of variable name – if just a few characters can uniquely identify the
variable name, Stata will supply the rest of the variable.
b. click the variable as it appears in the variable window (or double-click, depending
on your set-up)
3. If you would like to re-do a command that you just did, you can press “page up” and the
previous command will appear. If you press “page up” again it will supply the command
before that, etc.
4. If you would like a record of what you did,
a. you can keep a log of your session, or part of a session. In Stata 10, go to “file”
and choose “log”, or click on the brown rectangle in the tool bar. Then it will ask
you where to save it and in what format.
b. You can save everything you have done during a Stata session. that is in the
“Review” window by clicking on the white square in the top left corner and
choosing “Save review contents”
c. You could use a ‘do-file’ – write all your commands in one lengthy text file and
then have Stata run them all at once. This is probably not useful to you for
homework, but is necessary for some
5. In an “if” clause, you are telling Stata to execute a command depending on the value of a
variable. Examples:
a. If you want Stata to only display points in a scatter plot if the value of variable
“age” equals 30, you would include this somewhere in the command: “if
age==30”. == is just two = signs so it’s like saying “if age equal equal 30”
b. When the variable you are using for your “if” clause is a “string” variable,
meaning it is not understood by Stata to be numerical, you put quotation marks
around the value of the variable. So, in the case of the scatterplot below where we
use an “if” clause for the variable “type”, you write “if type==”bc” ”
c. If you would like Stata to run the command to EXCLUDE observations with a
certain value, use “~” – “if age~=30” means the command applies to observations
where age is not equal to 30. You only have to type one = sign after the ~.
Here is the Stata command:
. twoway (scatter income education, mlabel(type))
prof
prof
prof prof prof
prof
prof
prof
profprof
prof
prof
prof (^) prof prof prof
prof
prof
prof
prof
prof
prof
prof
prof
prof
prof
prof bc
prof
prof
wc
prof
NA wc
wc wc
wc wc
wc wc wc wc
wc wc (^) wc
wcwc
wc wc
wc
wc
wc NA
bc
wc wc
wc
bc bc
bc
bc
bc
NA
NAbcbc bc bc
bc
bc
bcbc
bc
bc bc
bc bc bc bc bc bc bc
bc
bc
bc
bc
bc
bc bc
bc
bcbc bc
bc
bc
prof
bc
bc bc bc
bc
bc
0
5000
10000
15000
20000
25000
income
6 8 10 12 14 16 education
To make a graph that looks like the colorful one in R (lecture notes), I had to create 3 plots, and
use an “if” clause for each one, and choose a color and symbol for each. The “if” clause specifies
which value of TYPE will be displayed (e.g., if type==”bc”). (This procedure is more
complicated than the one in R! And if you want to superimpose lines, you would use the “lfit”
command on 3 additional plots!)
twoway (scatter income education if type=="bc", mcolor(red) msymbol(circle_hollow)) (scatter income education if type=="prof", mcolor(midgreen) msymbol(triangle_hollow)) (scatter income education if type=="wc", mcolor(blue) msymbol(plus)), legend(off)
0
5000
10000
15000
20000
25000
income
6 8 10 12 14 16 education
For the slide titled “Least Squares Regression”, you can refer
back to the scatterplot with line above. The Stata command is
“reg”. After “reg” (for regress) you list the y-variable
(a.k.a. dependent variable, response variable) first, then the
x-variable (independent variable, explanatory variable).
. reg prestige education
Source | SS df MS Number of obs = 102 -------------+------------------------------ F( 1, 100) = 260. Model | 21608.4361 1 21608.4361 Prob > F = 0. Residual | 8286.99 100 82.8699 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total | 29895.4261 101 295.994318 Root MSE = 9.
prestige | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- education | 5.360878 .3319882 16.15 0.000 4.702223 6. _cons | -10.73198 3.677088 -2.92 0.004 -18.02722 -3.
For the slide with 2 lines of different colors representing 2
different equations, I did not provide Stata commands because
there are no R commands provided.
Using the data provided by the CD that came with the book. Use
ta02006. I downloaded it as an Excel file, then cut and paste to
Stata. To do this, you open Stata and go to the editor window
(or just type “edit”) and you see empty rows and columns. Then
just paste what you had from Excel.
. twoway (scatter y1 x) (lfit y1 x)
4
6
8
10
12
(^4 6 8) x 10 12 14
y1 Fitted values
[To save time, don’t use the graphic interface to do all 4 of
these. Just hit “page up” and change the variable names while
retaining the commands.]
. twoway (scatter y2 x) (lfit y2 x)
2
4
6
8
10
4 6 8 10 12 14 x y2 Fitted values
. twoway (scatter y3 x) (lfit y3 x)
4
6
8
10
12
4 6 8 10 12 14 x y3 Fitted values
. twoway (scatter y4 x4) (lfit y4 x4)
Stata automatically made the lines two different colors. The red line excludes the outlier.
Here are the relevant regression equations. The first includes observation 12, and the second
excludes it using “if id~=12”.
. reg weight height
Source | SS df MS Number of obs = 200 -------------+------------------------------ F( 1, 198) = 7. Model | 1630.88712 1 1630.88712 Prob > F = 0. Residual | 43713.1129 198 220.773297 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total | 45344 199 227.859296 Root MSE = 14.
weight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- height | .2384059 .0877159 2.72 0.007 .0654286. _cons | 25.26623 14.95042 1.69 0.093 -4.216263 54.
. reg weight height if id~=
Source | SS df MS Number of obs = 199 -------------+------------------------------ F( 1, 197) = 288. Model | 20941.4871 1 20941.4871 Prob > F = 0. Residual | 14312.0205 197 72.6498502 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total | 35253.5075 198 178.048018 Root MSE = 8.
weight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- height | 1.149222 .0676889 16.98 0.000 1.015734 1. _cons | -130.747 11.56271 -11.31 0.000 -153.5496 -107.
Extra Material:
As we saw with “histogram infant, kdensity”, you can add options after a command with a comma. Here is an option that may be used with the “reg” command: beta. “beta asks that standardized beta coefficients be reported instead of confidence intervals. The beta coefficients are the regression coefficients obtained by first standardizing all variables to have a mean of 0 and a standard deviation of 1. beta may not be specified with vce(cluster clustvar) or the svy prefix.” Here is how you use it, and what it does:
. reg prestige education, beta
Source | SS df MS Number of obs = 102 -------------+------------------------------ F( 1, 100) = 260. Model | 21608.4361 1 21608.4361 Prob > F = 0. Residual | 8286.99 100 82.8699 R-squared = 0. -------------+------------------------------ Adj R-squared = 0. Total | 29895.4261 101 295.994318 Root MSE = 9.
prestige | Coef. Std. Err. t P>|t| Beta -------------+---------------------------------------------------------------- education | 5.360878 .3319882 16.15 0.000. _cons | -10.73198 3.677088 -2.92 0..