Docsity
Docsity

Prepare-se para as provas
Prepare-se para as provas

Estude fácil! Tem muito documento disponível na Docsity


Ganhe pontos para baixar
Ganhe pontos para baixar

Ganhe pontos ajudando outros esrudantes ou compre um plano Premium


Guias e Dicas
Guias e Dicas


Stata cheat sheets 1.5, Esquemas de Estatística

Stata cheat sheets for statistics

Tipologia: Esquemas

2019

Compartilhado em 19/05/2019

fernando-macedo-2
fernando-macedo-2 🇧🇷

5

(1)

2 documentos

1 / 6

Toggle sidebar

Esta página não é visível na pré-visualização

Não perca as partes importantes!

bg1
frequently used
commands are
highlighted in yellow
use "yourStataFile.dta", clear
load a dataset from the current directory
import delimited "yourFile.csv", /*
*/ rowrange(2:11) colrange(1:8) varnames(2)
import a .csv file
webuse set "https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data"
webuse "wb_indicators_long"
set web-based directory and load data from the web
import excel "yourSpreadsheet.xlsx", /*
*/ sheet("Sheet1") cellrange(A2:H11) firstrow
import an Excel spreadsheet
Import Data
sysuse auto, clear
load system data (Auto data) for many examples, we
use the auto dataset.
display price[4]
display the 4th observation in price; only works on single values
levelsof rep78
display the unique values for rep78
Explore Data
duplicates report
finds all duplicate values in each variable
describe make price
display variable type, format,
and any value/variable labels
ds, has(type string)
lookfor "in."
search for variable types,
variable name, or variable label
isid mpg
check if mpg uniquely
identifies the data plot a histogram of the
distribution of a variable
count if price > 5000
count
number of rows (observations)
Can be combined with logic
VIEW DATA ORGANIZATION
inspect mpg
show histogram of data,
number of missing or zero
observations
summarize make price mpg
print summary statistics
(mean, stdev, min, max)
for variables
codebook make price
overview of variable type, stats,
number of missing/unique values
SEE DATA DISTRIBUTION
BROWSE OBSERVATIONS WITHIN THE DATA
gsort price mpg gsort price mpg
sort in order, first by price then mil es per gallon
(descending)(ascending)
list make price if price > 10000 & !missing(price) clist ...
list the make and price for observations with price > $10,000
(compact form)
open the data editor
browse Ctrl 8+
or Missing values are treated as the largest
positive number. To exclude missing values,
ask whether the value is less than "."
histogram mpg, frequency
assert price!=.
verify truth of claim
Summarize Data
bysort rep78: tabulate foreign
for each value of rep78, apply the command tabulate foreign
collapse (mean) price (max) mpg, by(foreign)
calculate mean price & max mpg by car type (foreign)
replaces data
tabstat price weight mpg, by(foreign) stat(mean sd n)
create compact table of summary statistics
table foreign, contents(mean price sd price) f(%9.2fc) row
create a flexible table of summary statistics
displays stats
for all data
formats numbers
tabulate rep78, mi gen(repairRecord)
one-way table : number of rows with each value of rep78
create binary variable for every rep78
value in a new variable, repairRecord
include missing values
tabulate rep78 foreign, mi
two-way table: cross-tabulate number of observations
for each combination of rep78 and foreign
Create New Variables
see help egen
for more options
egen meanPrice = mean(price), by(foreign)
calculate mean price for each group in foreign
pctile mpgQuartile = mpg, nq = 4
create quartiles of the mpg data
generate totRows = _N bysort rep78: gen repairTot = _N
_N creates a running count of the total observations per group
bysort rep78: gen repairIdx = _ngenerate id = _n
_n creates a running index of observations in a group
generate mpgSq = mpg^2 gen byte lowPr = price < 4000
create a new variable. Useful also for creating binar y
variables based on a condition (generate byte)
Change Data Types
destring foreignString, gen(foreignNumeric)
gen foreignNumeric = real(foreignString)
1encode foreignString, gen(foreignNumeric)"foreign"
"1"
"1"
Stata has 6 data types, and data can also be missing:
byte
true/false
int long float double
numbers
string
words
missing
no data
To convert between numbers & strings:
1decode foreign , gen(foreignString)
tostring foreign, gen(foreignString)
gen foreignString = string(foreign)
"foreign"
"1"
"1"
recast double mpg
generic way to convert between types
if foreign != 1 & price >= 10000
make
Chevy Colt
Buick Riviera
Honda Civic
Volvo 260 1 11,995
1 4,499
0 10,372
0 3,984
foreign price
Arithmetic Logic
+add (numbers)
combine (strings)
subtract
*multiply
/divide
^raise to a power
or
|
not
!or ~
and
&
Basic Data Operations
if foreign != 1 | price >= 10000
make
Chevy Colt
Buick Riviera
Honda Civic
Volvo 260 1 11,995
1 4,499
0 10,372
0 3,984
foreign price
>greater than
>= greater or equal to
<= less than or equal to
<less than
equal
==
== tests if something is equal
= assigns a value to a variable
not
equal
or
!=
~=
Basic Syntax
All Stata commands have the same format (syntax):
bysort rep78 : summarize price if foreign == 0 & price <= 9000, detail
[byvarlist1:]command [varlist2] [=exp] [ifexp] [inrange] [weight] [usingfilename] [,options]
function: what are
you going to do
to varlists?
condition: only
apply the function
if something is true
apply to
specific rows apply
weights
save output as
a new variable pull data from a file
(if not loaded) special options
for command
apply the
command across
each unique
combination of
variables in
varlist1
column to
apply
command to
In this example, we want a detailed summary
with stats like kurtosis, plus mean and median
To find out more about any command – like what options it takes – type helpcommand
pwd
print current (working) directory
cd "C:\Program Files (x86)\Stata13"
change working directory
dir
display fi lenames in working directory
dir *.dta
List all Stata data in working directory
capture log close
close the log on any existing do fil es
log using "myDoFil e.txt", replace
create a new log file to record your work and results
Set up
search mdesc
find the package mdesc to install
ssc install mdesc
install the package mdesc; needs to be done once
packages contain
extra commands that
expand Stata’s toolkit
underlined parts
are shortcuts –
use "capture"
or "cap"
Ctrl D+
highlight text in .do file,
then ctrl + d executes it
in the command line
clear
delete data in memory
Useful Shortcuts
Ctrl 8
open the data editor
+
F2
describe data
cls clear the console (where results are displayed)
PgUp PgDn scroll through previous commands
Tab autocompletes variable name after typing part
AT COMMAND PROMPT
Ctrl 9
open a new .do file
+
keyboard buttons
Data Processing
Cheat Sheetwith Stata 15
For more info see Stata’s reference manual (stata.com)
Tim Essam ([email protected]) • Laura Hughes ([email protected])
follow us @StataRGIS and @flaneuseks
inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) updated June 2016
CC BY 4.0
geocenter.github.io/StataTraining
Disclaimer: we are not affiliated with Stata. But we like it.
pf3
pf4
pf5

Pré-visualização parcial do texto

Baixe Stata cheat sheets 1.5 e outras Esquemas em PDF para Estatística, somente na Docsity!

frequently used commands are highlighted in yellow

use "yourStataFile.dta" , clear load a dataset from the current directory

import delimited "yourFile.csv" , / / rowrange( 2:11 ) colrange( 1:8 ) varnames( 2 ) import a .csv file

webuse set "https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data" webuse "wb_indicators_long" set web-based directory and load data from the web

import excel "yourSpreadsheet.xlsx" , /*

*/ sheet( "Sheet1" ) cellrange( A2:H11 ) firstrow import an Excel spreadsheet

Import Data

sysuse auto, clear load system data (Auto data)

for many examples, we use the auto dataset.

display price[4] display the 4th observation in price; only works on single values

levelsof rep display the unique values for rep

Explore Data

duplicates report finds all duplicate values in each variable

describe make price display variable type, format, and any value/variable labels

ds, has(type string) lookfor "in." search for variable types, variable name, or variable label isid mpg check if mpg uniquely identifies the data

plot a histogram of the distribution of a variable

count if price > 5000

count

number of rows (observations) Can be combined with logic

VIEW DATA ORGANIZATION

inspect mpg show histogram of data, number of missing or zero observations

summarize make price mpg print summary statistics (mean, stdev, min, max) for variables

codebook make price overview of variable type, stats, number of missing/unique values

SEE DATA DISTRIBUTION

BROWSE OBSERVATIONS WITHIN THE DATA

gsort price mpg gsort – price mpg sort in order, first by price then miles per gallon

(ascending) (descending)

list make price if price > 10000 &! missing (price) clist ... list the make and price for observations with price > $10,

(compact form)

open the data editor

browse or^ Ctrl + 8

Missing values are treated as the largest positive number. To exclude missing values, ask whether the value is less than "."

histogram mpg , frequency

assert price !=. verify truth of claim

Summarize Data

bysort rep78 : tabulate foreign for each value of rep78, apply the command tabulate foreign

collapse (mean) price (max) mpg , by( foreign ) calculate mean price & max mpg by car type (foreign)

replaces data

tabstat price weight mpg , by( foreign ) stat( mean sd n ) create compact table of summary statistics

table foreign , contents( mean price sd price ) f(%9.2fc) row create a flexible table of summary statistics

displays stats formats numbersfor all data

tabulate rep78 , mi gen( repairRecord ) one-way table: number of rows with each value of rep

create binary variable for every rep value in a new variable, repairRecord

include missing values

tabulate rep78 foreign , mi two-way table: cross-tabulate number of observations for each combination of rep78 and foreign

Create New Variables

see help egen for more options

egen meanPrice = mean( price ), by( foreign ) calculate mean price for each group in foreign

pctile mpgQuartile = mpg , nq = 4 create quartiles of the mpg data

generate totRows = _N bysort rep78 : gen repairTot = _N _N creates a running count of the total observations per group

generate id = _n bysort rep78 : gen repairIdx = _n _n creates a running index of observations in a group

generate mpgSq = mpg^2 gen byte lowPr = price < 4000 create a new variable. Useful also for creating binary variables based on a condition ( generate byte )

Change Data Types

destring foreignString , gen( foreignNumeric )

gen foreignNumeric = real( foreignString ) 1 encode foreignString , gen( foreignNumeric ) "foreign"

Stata has 6 data types, and data can also be missing:

byte

true/false int long float double

numbers string

words missing

no data

To convert between numbers & strings:

1 decode foreign , gen( foreignString )

tostring foreign , gen( foreignString )

gen foreignString = string( foreign )

"foreign"

recast double mpg generic way to convert between types

if foreign != 1 & price >= 10000 make Chevy Colt Buick Riviera Honda Civic Volvo 260 1 11,

1 4,

0 10,

0 3,

foreign price

Arithmetic Logic

add (numbers) combine (strings)

− subtract

* multiply

/ divide

^ raise to a power

| or

! or^ ~^ not

& and

Basic Data Operations

if foreign != 1 | price >= 10000 make Chevy Colt Buick Riviera Honda Civic Volvo 260 1 11,

1 4,

0 10,

0 3,

foreign price

> greater than

>= greater or equal to

<= less than or equal to

== equal < less than

== tests if something is equal = assigns a value to a variable

not or equal

Basic Syntax

All Stata commands have the same format (syntax):

bysort rep78 : summarize price if foreign == 0 & price <= 9000, detail

[ by varlist1 : ] command [varlist2] [ =exp ] [ if exp ] [ in range] [weight] [ using filename] [ , options ]

function: what are you going to do to varlists?

condition: only apply the function if something is true

apply to specific rows

apply weights

save output as a new variable

pull data from a file (if not loaded)

special options for command

apply the command across each unique combination of variables in varlist

column to apply command to In this example, we want a detailed summary with stats like kurtosis, plus mean and median

To find out more about any command – like what options it takes – type help command

pwd print current (working) directory

cd "C:\Program Files (x86)\Stata13" change working directory

dir

display filenames in working directory

*dir .dta List all Stata data in working directory

capture log close close the log on any existing do files

log using "myDoFile.txt" , replace

create a new log file to record your work and results

Set up

search mdesc

find the package mdesc to install

ssc install mdesc

install the package mdesc; needs to be done once

packages contain extra commands that expand Stata’s toolkit

underlined parts are shortcuts – use "capture" or "cap"

Ctrl + D highlight text in .do file, then ctrl + d executes it in the command line

clear delete data in memory

Useful Shortcuts

Ctrl 8 open the data editor

F

describe data

cls clear the console (where results are displayed)

PgUp PgDn scroll through previous commands

Tab autocompletes variable name after typing part

AT COMMAND PROMPT

Ctrl 9 open a new .do file

keyboard buttons +

Data Processing

with Stata 15 Cheat Sheet For more info see Stata’s reference manual (stata.com)

Tim Essam ([email protected]) • Laura Hughes ([email protected]) inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) updated June 2016 geocenter.github.io/StataTraining Disclaimer: we are not affiliated with Stata. But we like it.

export delimited "myData.csv" , delimiter(",") replace export data as a comma-delimited file (.csv)

export excel "myData.xls" , /* */ firstrow(variables) replace export data as an Excel file (.xls) with the variable names as the first row

Save & Export Data

save "myData.dta" , replace saveold "myData.dta" , replace version(12) save data in Stata format, replacing the data if a file with same name exists

Stata 12-compatible file

compress compress data in memory

Manipulate Strings

display trim( " leading / trailing spaces " ) remove extra spaces before and after a string

display regexr( "My string", "My", "Your" ) replace string1 ("My") with string2 ("Your")

display stritrim( " Too much Space" ) replace consecutive spaces with a single space

display strtoname( "1Var name" ) convert string to Stata-compatible variable name

TRANSFORM STRINGS

display strlower( "STATA should not be ALL-CAPS" ) change string case; see also strupper , strproper

display strmatch( "123.89", "1??.?9" ) return true (1) or false (0) if string matches pattern

list make if regexm( make, "[0-9]" ) list observations where make matches the regular expression (here, records that contain a number)

FIND MATCHING STRINGS

GET STRING PROPERTIES

list if regexm( make, "(Cad.|Chev.|Datsun)") return all observations where make contains "Cad.", "Chev." or "Datsun"

list if inlist( word(make, 1), "Cad.", "Chev.", "Datsun" ) return all observations where the first word of the make variable contains the listed words

compare the given list against the first word in make

charlist make display the set of unique characters within a string

  • user-defined package

replace make = subinstr( make, "Cad.", "Cadillac", 1 ) replace first occurrence of "Cad." with Cadillac in the make variable

display length( "This string has 29 characters" ) return the length of the string

display substr( "Stata", 3, 5 ) return string of 5 characters starting with position 3

display strpos( "Stata", "a" ) return the position in Stata where a is first found

display real( "100" ) convert string to a numeric or missing value

_merge code row only in ind row onlyin hh row in both

(master)^1 (using)^2 (match)^3

Combine Data

ADDING (APPENDING) NEW DATA

MERGING TWO DATASETS TOGETHER

FUZZY MATCHING: COMBINING TWO DATASETS WITHOUT A COMMON ID

merge 1:1 id using "ind_age.dta" one-to-one merge of "ind_age.dta" into the loaded dataset and create variable "_merge" to track the origin

webuse ind_age.dta , clear save ind_age.dta , replace webuse ind_ag.dta , clear

merge m:1 hid using "hh2.dta" many-to-one merge of "hh2.dta" into the loaded dataset and create variable "_merge" to track the origin

webuse hh2.dta , clear save hh2.dta , replace webuse ind2.dta , clear

append using "coffeeMaize2.dta" , gen( filenum ) add observations from "coffeeMaize2.dta" to current data and create variable "filenum" to track the origin of each observation

webuse coffeeMaize2.dta , clear save coffeeMaize2.dta , replace webuse coffeeMaize.dta , clear

id blue pink load demo data

+

id blue pink

id blue pink

should contain the same variables (columns)

MA- id blue pink id brown blue pink brown _merge 3 3 1 3

2

1

3

..

.

.

id

+ =

ONE-TO-ONE id blue pink id brown id blue pink brown_merge 3 3 3

+ =

must contain a common variable (id)

reclink match records from different data sets using probabilistic matching create distance measure for similarity between two strings

ssc install reclink jarowinkler ssc install jarowinkler

Reshape Data

webuse set https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data webuse "coffeeMaize.dta" load demo dataset

xpose, clear varname transpose rows and columns of data, clearing the data and saving old column names as a new variable called "_varname"

MELT DATA (WIDE → LONG)

reshape long coffee@ maize@ , i( country ) j( year ) convert a wide dataset to long

reshape variables starting with coffee and maize

unique id variable (key)

create new variable which captures the info in the column names

CAST DATA (LONG → WIDE)

reshape wide coffee maize , i( country ) j( year ) convert a long dataset to wide

create new variables named coffee2011, maize2012...

what will be unique id variable (key)

create new variables with the year added to the column name

When datasets are tidy, they have a c o n s i s t e n t , standard format that is easier to manipulate and analyze.

country coffee 2011 coffee 2012 maize 2011 maize 2012 Malawi Rwanda Uganda (^) cast

melt

Rwanda Uganda

Malawi Malawi Rwanda

Uganda 2012

2011

2011 2012 2011 2012

country year coffee maize

WIDE LONG (TIDY)

TIDY DATASETS

have each obser- vation in its own row and each variable in its own

new variable

Label Data

label list list all labels within the dataset

label define myLabel 0 "US" 1 "Not US" label values foreign myLabel define a label and apply it the values in foreign

Value labels map string descriptions to numbers. They allow the underlying data to be numeric (making logical tests simpler) while also connecting the values to human-understandable text.

note : data note here place note in dataset

Replace Parts of Data

rename ( rep78 foreign ) ( repairRecord carType ) rename one or multiple variables

CHANGE COLUMN NAMES

recode price ( 0 / 5000 = 5000 ) change all prices less than 5000 to be $5, recode foreign ( 0 = 2 "US" )( 1 = 1 "Not US" ), gen( foreign2 ) change the values and value labels then store in a new variable, foreign

CHANGE ROW VALUES

mvencode _all , mv( 9999 ) useful for exporting data replace missing values with the number 9999 for all variables

mvdecode _all , mv( 9999 ) replace the number 9999 with missing value in all variables

useful for cleaning survey datasets

REPLACE MISSING VALUES

replace price = 5000 if price < 5000 replace all values of price that are less than $5,000 with 5000

Select Parts of Data (Subsetting)

FILTER SPECIFIC ROWS

drop if mpg < 20 drop in 1/ drop observations based on a condition (left) or rows 1-4 (right) keep in 1/ opposite of drop; keep only rows 1-

keep if inlist( make, "Honda Accord", "Honda Civic", "Subaru" ) keep the specified values of make

keep if inrange(price, 5000, 10000) keep values of price between $5,000 – $10,000 (inclusive)

sample 25 sample 25% of the observations in the dataset (use set seed # command for reproducible sampling)

SELECT SPECIFIC COLUMNS

drop make remove the 'make' variable keep make price opposite of drop; keep only variables 'make' and 'price'

Data Transformation

with Stata 15 Cheat Sheet For more info see Stata’s reference manual (stata.com)

Tim Essam ([email protected]) • Laura Hughes ([email protected]) inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) updated June 2016 geocenter.github.io/StataTraining Disclaimer: we are not affiliated with Stata. But we like it.

Laura Hughes ([email protected]) • Tim Essam ([email protected]) follow us @flaneuseks and @StataRGIS

inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) updated June 2016 CC BY 4.

geocenter.github.io/StataTraining Disclaimer: we are not affiliated with Stata. But we like it.

SYMBOLS LINES / BORDERS TEXT

xlabel(# 10 , tposition( crossing )) number of tick marks, position (outside | crossing | inside)

tick marks

legend

line tick marks grid lines

axes

xline(...) yline(...)

xscale(...) yscale(...)

legend(region(...))

xlabel(...) ylabel(...)

marker axis labels

legend

xlabel(...) ylabel(...)

legend(...)

title(...) subtitle(...) xtitle(...) ytitle(...)

titles

text(...)

marker label

annotation

jitter( # ) randomly displace the markers

jitterseed( # )

marker arguments for the plot objects (in green) go in the options portion of these commands (in orange)

mcolor( "145 168 208" ) mcolor( none ) specify the fill and stroke of the marker in RGB or with a Stata color mfcolor( "145 168 208" ) mfcolor( none ) specify the fill of the marker

lcolor( "145 168 208" ) specify the stroke color of the line or border

lcolor( none )

mlcolor( "145 168 208" )

glcolor( "145 168 208" )

tlcolor( "145 168 208" )

marker

grid lines

tick marks

mlabcolor( "145 168 208" ) labcolor( "145 168 208" )

specify the color of the text

color( "145 168 208" ) color( none )

axis labels

marker label

ehuge

vhuge

huge

vlarge

large

medlarge

medium

medsmall

tiny vtiny

vsmall

small

msize( medium ) specify the marker size:

Text^ huge

Text^ vhuge

Text vlarge

Text large

Text medlarge Text medium

Text (^) third_tiny Text (^) quarter_tiny Text (^) minuscule

Text half_tiny

Text tiny

Text vsmall

Text medsmall Text small

mlabsize( medsmall )

specify the size of the text:

labsize( medsmall )

size( medsmall )

axis labels

marker label

vvvthick medthin vvthick thin

medium none

vthick vthin

medthick vvvthin

thick vvthin

lwidth( medthick ) specify the thickness (stroke) of a line:

mlwidth( thin )

glwidth( thin )

tlwidth( thin )

marker

grid lines

tick marks

label location relative to marker (clock position: 0 – 12)

marker label mlabposition( 5 )

POSITION

msymbol( Dh ) specify the marker symbol:

O

o

oh

Oh

D

d

dh

Dh

X

T

t

th

Th

p i

S

s

sh

Sh

none

format(%12.2f ) change the format of the axis labels

axis labels

nolabels no axis labels

axis labels

mlabel( foreign ) label the points with the values of the foreign variable

marker label

off turn off legend

legend

label( # "label" ) change legend label text

legend

glpattern( dash ) solid longdash longdash_dot

dot dash_dot blank

dash shortdash shortdash_dot

lpattern( dash ) grid lines

line axes specify the line pattern

tick marks tlength( 2 ) nogmin nogmax

noline axes off

nogrid

noticks

axes

grid lines

tick marks

no axis/labels

set seed

for example: scatter price mpg , xline( 20, lwidth (vthick) )

SYNTAX

SIZE / THICKNESSS

APPEARANCE

COLOR

mcolor( "145 168 208 %20" )

adjust transparency by adding %#

Plotting in Stata 15

Customizing Appearance

For more info see Stata’s reference manual (stata.com)

Schemes are sets of graphical parameters, so you don’t have to specify the look of the graphs every time.

Apply Themes

adopath ++ " ~//StataThemes " set path of the folder (StataThemes) where custom .scheme files are saved

net inst brewscheme, from("https://wbuchanan.github.io/brewscheme/") replace install William Buchanan’s package to generate custom schemes and color palettes (including ColorBrewer)

twoway scatter mpg price , scheme( customTheme )

USING A SAVED THEME

help scheme entries see all options for setting scheme properties

Create custom themes by saving options in a .scheme file

set scheme customTheme , permanently change the theme

set as default scheme

twoway scatter mpg price , play( graphEditorTheme )

USING THE GRAPH EDITOR

Select the Graph Editor

Click Record

Double click on symbols and areas on plot, or regions on sidebar to customize

Save theme as a .grec file

Unclick Record

1

2

3

4

5

6 7

8 9

10

0

50

100

150

200

y-axis title

0 20 40 60 80 100 x-axis title y Fitted values

subtitle

title

legend

x-axis

y-axis

y-line

y-axis title

y-axis labels

titles

marker label line

marker

tick marks

grid lines

annotation

plots contain many features

ANATOMY OF A PLOT

scatter price mpg , graphregion(fcolor( "192 192 192" ) ifcolor( "208 208 208" )) specify the fill of the background in RGB or with a Stata color scatter price mpg , plotregion(fcolor( "224 224 224" ) ifcolor( "240 240 240" )) specify the fill of the plot background in RGB or with a Stata color

outer region inner region

inner plot region

graph region inner graph region

plot region

Save Plots

graph twoway scatter y x, saving(" myPlot.gph ") replace save the graph when drawing graph save " myPlot.gph ", replace save current graph to disk

graph export " myPlot.pdf ", as(.pdf ) export the current graph as an image file

graph combine plot1.gph plot2.gph... combine 2+ saved graphs into a single plot see options to set size and resolution

Data Analysis

with Stata 15 Cheat Sheet For more info see Stata’s reference manual (stata.com)

Tim Essam ([email protected]) • Laura Hughes ([email protected]) inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) updated June 2016 geocenter.github.io/StataTraining Disclaimer: we are not affiliated with Stata. But we like it.

OPERATOR EXAMPLE i. specify indicators regress price i. rep78 specify rep78 variable to be an indicator variable ib. specify base indicator regress price ib( 3 ). rep78 set the third category of rep78 to be the base category fvset command to change base fvset base frequent rep78 set the base to most frequently occurring category for rep c. treat mpg as a continuous variable and specify an interaction between foreign and mpg

treat variable as continuous regress price i.foreign# c. mpg i.foreign

specify interactions regress price mpg c.mpg # c.mpg create a squared mpg term to be used in regression

o. omit a variable or indicator regress price io( 2 ). rep78 set rep78 as an indicator; omit observations with rep78 == 2

specify factorial interactions regress price c.mpg ## c.mpg create all possible interactions with mpg (mpg and mpg 2 )

DESCRIPTION

CATEGORICAL VARIABLES identify a group to which an observations belongs INDICATOR VARIABLES denote whether T F something is true or false

CONTINUOUS VARIABLES measure something

Declare Data

tsline spot plot time series of sunspots

xtset id year declare national longitudinal data to be a panel

generate lag_spot = L1.spot create a new variable of annual lags of sun spots

tsreport report time series aspects of a dataset

xtdescribe report panel aspects of a dataset xtsum hours summarize hours worked, decomposing standard deviation into between and within components

arima spot , ar( 1/2 ) estimate an auto-regressive model with 2 lags xtreg ln_w c.age##c.age ttl_exp, fe vce (robust) estimate a fixed-effects model with robust standard errors

xtline ln_wage if id <= 22 , tlabel( #3 ) plot panel data as a line plot

svydescribe report survey data details svy: mean age, over( sex ) estimate a population mean for each subpopulation

svy: tabulate sex heartatk report two-way table with tests of independence

svy, subpop( rural ): mean age estimate a population mean for rural areas

tsset time , yearly declare sunspot data to be yearly time series

TIME SERIES webuse^ sunspot, clear^ PANEL / LONGITUDINAL webuse nlswork, clear

SURVEY DATA webuse^ nhanes2b, clear

svyset psuid [ pweight = finalwgt], strata (stratid) declare survey design for a dataset

svy: reg zinc c.age##c.age female weight rural estimate a regression using survey weights

stset studytime, failure( died ) declare survey design for a dataset

SURVIVAL ANALYSIS webuse^ drugtr, clear

stsum summarize survival-time data stcox drug age estimate a Cox proportional hazard model

tscollap carryforward tsspell

compact time series into means, sums and end-of-period values carry non-missing values forward from one obs. to the next identify spells or runs in time series

USEFUL ADD-INS

pwmean mpg , over( rep78 ) pveffects mcompare( tukey ) estimate pairwise comparisons of means with equal variances include multiple comparison adjustment

anova systolic drug^ webuse^ systolic, clear analysis of variance and covariance

ttest mpg , by( foreign ) estimate t test on equality of means for mpg by foreign

tabulate foreign rep78 , chi2 exact expected tabulate foreign and repair record and return chi 2 and Fisher’s exact statistic alongside the expected values

prtest foreign == 0. one-sample test of proportions ksmirnov mpg , by( foreign ) exact Kolmogorov-Smirnov equality-of-distributions test ranksum mpg , by( foreign ) equality tests on unmatched data (independent samples)

By declaring data type, you enable Stata to apply data munging and analysis functions specific to certain data types

TIME SERIES OPERATORS L. lag x (^) t-1 L2. 2-period lag x (^) t- F. lead x (^) t+1 F2. 2-period lead x (^) t+ D. difference x (^) t-x (^) t-1 D2. difference of difference xt-xt−1-(xt−1-xt−2) S. seasonal difference x (^) t-xt-1 S2. lag-2 (seasonal difference) xt−xt−

logit foreign headroom mpg, or estimate logistic regression and report odds ratios

regress price mpg weight, vce( robust ) estimate ordinary least squares (OLS) model on mpg weight and foreign, apply robust standard errors

probit foreign turn price, vce (robust) estimate probit regression with robust standard errors

rreg price mpg weight, genwt (reg_wt) estimate robust regression to eliminate outliers

regress price mpg weight if foreign == 0, vce( cluster rep78 ) regress price only on domestic cars, cluster standard errors

bootstrap, reps( 100 ): regress mpg /* */ weight gear foreign estimate regression with bootstrapping jackknife r( mean ), double: sum mpg jackknife standard error of sample mean

Examples use auto.dta (sysuse auto, clear)

Summarize Data unless otherwise noted

Statistical Tests

Estimation with Categorical & Factor Variables

display _b[length] display _se[length] return coefficient estimate or standard error for mpg from most recent regression model margins, dydx( length ) return the estimated marginal effect for mpg margins, eyex( length ) return the estimated elasticity for price predict yhat if e( sample ) create predictions for sample on which model was fit predict double resid, residuals calculate residuals based on last fit model test headroom = 0 test linear hypotheses that headroom estimate equals zero lincom headroom - length test linear combination of estimates (headroom = length)

regress price headroom length Used in all postestimation examples

more details at http://www.stata.com/manuals/u25.pdf

pwcorr price mpg weight , star( 0.05 ) return all pairwise correlation coefficients with sig. levels

correlate mpg price return correlation or covariance matrix

mean price mpg estimates of means, including standard errors proportion rep78 foreign estimates of proportions, including standard errors for categories identified in varlist ratio estimates of ratio, including standard errors total price estimates of totals, including standard errors

ci mean mpg price , level( 99 ) compute standard errors and confidence intervals

stem mpg return stem-and-leaf display of mpg summarize price mpg , detail calculate a variety of univariate summary statistics

frequently used commands are highlighted in yellow

univar price mpg , boxplot calculate univariate summary, with box-and-whiskers plot

ssc install univar

returns e-class information when post option is used

Type help regress postestimation plots for additional diagnostic plots

estat hettest test for heteroskedasticity

vif report variance inflation factor

ovtest test for omitted variable bias

dfbeta( length ) calculate measure of influence rvfplot, yline( 0 ) plot residuals against fitted values

plot all partial- regression leverage plots in one graph

avplots

Residuals Fitted values

pricempg pricerep

priceheadroom priceweight

2 Diagnostics^ some are inappropriate with robust SEs

3 Postestimation

1 Estimate Models

commands that use a fitted model

stores results as -class

r

e

r

e

Results are stored as either r -class or e -class. See Programming Cheat Sheet

r

e

r

r

r

r

r

r

e

e

e

e

0

100

200 Number of sunspots

1850 1900 1950

4 2 0

4 2 0

1970 1980 1990

id 1 id 2

4 id 3^ id 4 2 0

wage relative to inflation

Blinder-Oaxaca decomposition

ADDITIONAL MODELS

xtline plot

tsline plot

ivregress ivreg2 instrumental variables

pca principal components analysis factor factor analysis poissonnbreg count outcomes tobit censored data

diff difference-in-difference

built-in Stata command

rd regression discontinuity xtabond xtdpdsys dynamic panel estimator teffects psmatch propensity score matching synth synthetic control analysis oaxaca

user-written ssc install ivreg

for Stata 13: ci mpg price , level ( 99 )