



Estude fácil! Tem muito documento disponível na Docsity
Ganhe pontos ajudando outros esrudantes ou compre um plano Premium
Prepare-se para as provas
Estude fácil! Tem muito documento disponível na Docsity
Prepare-se para as provas com trabalhos de outros alunos como você, aqui na Docsity
Encontra documentos específicos para os exames da tua universidade
Prepare-se com as videoaulas e exercícios resolvidos criados a partir da grade da sua Universidade
Responda perguntas de provas passadas e avalie sua preparação.
Ganhe pontos para baixar
Ganhe pontos ajudando outros esrudantes ou compre um plano Premium
Stata cheat sheets for statistics
Tipologia: Esquemas
1 / 6
Esta página não é visível na pré-visualização
Não perca as partes importantes!




frequently used commands are highlighted in yellow
use "yourStataFile.dta" , clear load a dataset from the current directory
import delimited "yourFile.csv" , / / rowrange( 2:11 ) colrange( 1:8 ) varnames( 2 ) import a .csv file
webuse set "https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data" webuse "wb_indicators_long" set web-based directory and load data from the web
import excel "yourSpreadsheet.xlsx" , /*
*/ sheet( "Sheet1" ) cellrange( A2:H11 ) firstrow import an Excel spreadsheet
sysuse auto, clear load system data (Auto data)
for many examples, we use the auto dataset.
display price[4] display the 4th observation in price; only works on single values
levelsof rep display the unique values for rep
duplicates report finds all duplicate values in each variable
describe make price display variable type, format, and any value/variable labels
ds, has(type string) lookfor "in." search for variable types, variable name, or variable label isid mpg check if mpg uniquely identifies the data
plot a histogram of the distribution of a variable
count if price > 5000
count
number of rows (observations) Can be combined with logic
inspect mpg show histogram of data, number of missing or zero observations
summarize make price mpg print summary statistics (mean, stdev, min, max) for variables
codebook make price overview of variable type, stats, number of missing/unique values
gsort price mpg gsort – price – mpg sort in order, first by price then miles per gallon
(ascending) (descending)
list make price if price > 10000 &! missing (price) clist ... list the make and price for observations with price > $10,
(compact form)
open the data editor
browse or^ Ctrl + 8
Missing values are treated as the largest positive number. To exclude missing values, ask whether the value is less than "."
histogram mpg , frequency
assert price !=. verify truth of claim
bysort rep78 : tabulate foreign for each value of rep78, apply the command tabulate foreign
collapse (mean) price (max) mpg , by( foreign ) calculate mean price & max mpg by car type (foreign)
replaces data
tabstat price weight mpg , by( foreign ) stat( mean sd n ) create compact table of summary statistics
table foreign , contents( mean price sd price ) f(%9.2fc) row create a flexible table of summary statistics
displays stats formats numbersfor all data
tabulate rep78 , mi gen( repairRecord ) one-way table: number of rows with each value of rep
create binary variable for every rep value in a new variable, repairRecord
include missing values
tabulate rep78 foreign , mi two-way table: cross-tabulate number of observations for each combination of rep78 and foreign
see help egen for more options
egen meanPrice = mean( price ), by( foreign ) calculate mean price for each group in foreign
pctile mpgQuartile = mpg , nq = 4 create quartiles of the mpg data
generate totRows = _N bysort rep78 : gen repairTot = _N _N creates a running count of the total observations per group
generate id = _n bysort rep78 : gen repairIdx = _n _n creates a running index of observations in a group
generate mpgSq = mpg^2 gen byte lowPr = price < 4000 create a new variable. Useful also for creating binary variables based on a condition ( generate byte )
destring foreignString , gen( foreignNumeric )
gen foreignNumeric = real( foreignString ) 1 encode foreignString , gen( foreignNumeric ) "foreign"
Stata has 6 data types, and data can also be missing:
byte
true/false int long float double
numbers string
words missing
no data
To convert between numbers & strings:
1 decode foreign , gen( foreignString )
tostring foreign , gen( foreignString )
gen foreignString = string( foreign )
"foreign"
recast double mpg generic way to convert between types
if foreign != 1 & price >= 10000 make Chevy Colt Buick Riviera Honda Civic Volvo 260 1 11,
1 4,
0 10,
0 3,
foreign price
Arithmetic Logic
add (numbers) combine (strings)
| or
& and
if foreign != 1 | price >= 10000 make Chevy Colt Buick Riviera Honda Civic Volvo 260 1 11,
1 4,
0 10,
0 3,
foreign price
== tests if something is equal = assigns a value to a variable
not or equal
All Stata commands have the same format (syntax):
bysort rep78 : summarize price if foreign == 0 & price <= 9000, detail
[ by varlist1 : ] command [varlist2] [ =exp ] [ if exp ] [ in range] [weight] [ using filename] [ , options ]
function: what are you going to do to varlists?
condition: only apply the function if something is true
apply to specific rows
apply weights
save output as a new variable
pull data from a file (if not loaded)
special options for command
apply the command across each unique combination of variables in varlist
column to apply command to In this example, we want a detailed summary with stats like kurtosis, plus mean and median
To find out more about any command – like what options it takes – type help command
pwd print current (working) directory
cd "C:\Program Files (x86)\Stata13" change working directory
dir
display filenames in working directory
*dir .dta List all Stata data in working directory
capture log close close the log on any existing do files
log using "myDoFile.txt" , replace
create a new log file to record your work and results
search mdesc
find the package mdesc to install
ssc install mdesc
install the package mdesc; needs to be done once
packages contain extra commands that expand Stata’s toolkit
underlined parts are shortcuts – use "capture" or "cap"
Ctrl + D highlight text in .do file, then ctrl + d executes it in the command line
clear delete data in memory
Ctrl 8 open the data editor
describe data
cls clear the console (where results are displayed)
PgUp PgDn scroll through previous commands
Tab autocompletes variable name after typing part
Ctrl 9 open a new .do file
keyboard buttons +
Data Processing
with Stata 15 Cheat Sheet For more info see Stata’s reference manual (stata.com)
Tim Essam ([email protected]) • Laura Hughes ([email protected]) inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) updated June 2016 geocenter.github.io/StataTraining Disclaimer: we are not affiliated with Stata. But we like it.
export delimited "myData.csv" , delimiter(",") replace export data as a comma-delimited file (.csv)
export excel "myData.xls" , /* */ firstrow(variables) replace export data as an Excel file (.xls) with the variable names as the first row
Save & Export Data
save "myData.dta" , replace saveold "myData.dta" , replace version(12) save data in Stata format, replacing the data if a file with same name exists
Stata 12-compatible file
compress compress data in memory
Manipulate Strings
display trim( " leading / trailing spaces " ) remove extra spaces before and after a string
display regexr( "My string", "My", "Your" ) replace string1 ("My") with string2 ("Your")
display stritrim( " Too much Space" ) replace consecutive spaces with a single space
display strtoname( "1Var name" ) convert string to Stata-compatible variable name
display strlower( "STATA should not be ALL-CAPS" ) change string case; see also strupper , strproper
display strmatch( "123.89", "1??.?9" ) return true (1) or false (0) if string matches pattern
list make if regexm( make, "[0-9]" ) list observations where make matches the regular expression (here, records that contain a number)
list if regexm( make, "(Cad.|Chev.|Datsun)") return all observations where make contains "Cad.", "Chev." or "Datsun"
list if inlist( word(make, 1), "Cad.", "Chev.", "Datsun" ) return all observations where the first word of the make variable contains the listed words
compare the given list against the first word in make
charlist make display the set of unique characters within a string
replace make = subinstr( make, "Cad.", "Cadillac", 1 ) replace first occurrence of "Cad." with Cadillac in the make variable
display length( "This string has 29 characters" ) return the length of the string
display substr( "Stata", 3, 5 ) return string of 5 characters starting with position 3
display strpos( "Stata", "a" ) return the position in Stata where a is first found
display real( "100" ) convert string to a numeric or missing value
_merge code row only in ind row onlyin hh row in both
(master)^1 (using)^2 (match)^3
Combine Data
merge 1:1 id using "ind_age.dta" one-to-one merge of "ind_age.dta" into the loaded dataset and create variable "_merge" to track the origin
webuse ind_age.dta , clear save ind_age.dta , replace webuse ind_ag.dta , clear
merge m:1 hid using "hh2.dta" many-to-one merge of "hh2.dta" into the loaded dataset and create variable "_merge" to track the origin
webuse hh2.dta , clear save hh2.dta , replace webuse ind2.dta , clear
append using "coffeeMaize2.dta" , gen( filenum ) add observations from "coffeeMaize2.dta" to current data and create variable "filenum" to track the origin of each observation
webuse coffeeMaize2.dta , clear save coffeeMaize2.dta , replace webuse coffeeMaize.dta , clear
id blue pink load demo data
+
id blue pink
id blue pink
should contain the same variables (columns)
MA- id blue pink id brown blue pink brown _merge 3 3 1 3
2
1
3
..
.
.
id
+ =
ONE-TO-ONE id blue pink id brown id blue pink brown_merge 3 3 3
+ =
must contain a common variable (id)
reclink match records from different data sets using probabilistic matching create distance measure for similarity between two strings
ssc install reclink jarowinkler ssc install jarowinkler
Reshape Data
webuse set https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data webuse "coffeeMaize.dta" load demo dataset
xpose, clear varname transpose rows and columns of data, clearing the data and saving old column names as a new variable called "_varname"
reshape long coffee@ maize@ , i( country ) j( year ) convert a wide dataset to long
reshape variables starting with coffee and maize
unique id variable (key)
create new variable which captures the info in the column names
reshape wide coffee maize , i( country ) j( year ) convert a long dataset to wide
create new variables named coffee2011, maize2012...
what will be unique id variable (key)
create new variables with the year added to the column name
When datasets are tidy, they have a c o n s i s t e n t , standard format that is easier to manipulate and analyze.
country coffee 2011 coffee 2012 maize 2011 maize 2012 Malawi Rwanda Uganda (^) cast
melt
Rwanda Uganda
Malawi Malawi Rwanda
Uganda 2012
2011
2011 2012 2011 2012
country year coffee maize
have each obser- vation in its own row and each variable in its own
new variable
Label Data
label list list all labels within the dataset
label define myLabel 0 "US" 1 "Not US" label values foreign myLabel define a label and apply it the values in foreign
Value labels map string descriptions to numbers. They allow the underlying data to be numeric (making logical tests simpler) while also connecting the values to human-understandable text.
note : data note here place note in dataset
Replace Parts of Data
rename ( rep78 foreign ) ( repairRecord carType ) rename one or multiple variables
recode price ( 0 / 5000 = 5000 ) change all prices less than 5000 to be $5, recode foreign ( 0 = 2 "US" )( 1 = 1 "Not US" ), gen( foreign2 ) change the values and value labels then store in a new variable, foreign
mvencode _all , mv( 9999 ) useful for exporting data replace missing values with the number 9999 for all variables
mvdecode _all , mv( 9999 ) replace the number 9999 with missing value in all variables
useful for cleaning survey datasets
replace price = 5000 if price < 5000 replace all values of price that are less than $5,000 with 5000
Select Parts of Data (Subsetting)
drop if mpg < 20 drop in 1/ drop observations based on a condition (left) or rows 1-4 (right) keep in 1/ opposite of drop; keep only rows 1-
keep if inlist( make, "Honda Accord", "Honda Civic", "Subaru" ) keep the specified values of make
keep if inrange(price, 5000, 10000) keep values of price between $5,000 – $10,000 (inclusive)
sample 25 sample 25% of the observations in the dataset (use set seed # command for reproducible sampling)
drop make remove the 'make' variable keep make price opposite of drop; keep only variables 'make' and 'price'
Data Transformation
with Stata 15 Cheat Sheet For more info see Stata’s reference manual (stata.com)
Tim Essam ([email protected]) • Laura Hughes ([email protected]) inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) updated June 2016 geocenter.github.io/StataTraining Disclaimer: we are not affiliated with Stata. But we like it.
Laura Hughes ([email protected]) • Tim Essam ([email protected]) follow us @flaneuseks and @StataRGIS
inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) updated June 2016 CC BY 4.
geocenter.github.io/StataTraining Disclaimer: we are not affiliated with Stata. But we like it.
xlabel(# 10 , tposition( crossing )) number of tick marks, position (outside | crossing | inside)
tick marks
legend
line tick marks grid lines
axes
xline(...) yline(...)
xscale(...) yscale(...)
legend(region(...))
xlabel(...) ylabel(...)
marker axis labels
legend
xlabel(...) ylabel(...)
legend(...)
title(...) subtitle(...) xtitle(...) ytitle(...)
titles
text(...)
marker label
annotation
jitter( # ) randomly displace the markers
jitterseed( # )
marker arguments for the plot objects (in green) go in the options portion of these commands (in orange)
mcolor( "145 168 208" ) mcolor( none ) specify the fill and stroke of the marker in RGB or with a Stata color mfcolor( "145 168 208" ) mfcolor( none ) specify the fill of the marker
lcolor( "145 168 208" ) specify the stroke color of the line or border
lcolor( none )
mlcolor( "145 168 208" )
glcolor( "145 168 208" )
tlcolor( "145 168 208" )
marker
grid lines
tick marks
mlabcolor( "145 168 208" ) labcolor( "145 168 208" )
specify the color of the text
color( "145 168 208" ) color( none )
axis labels
marker label
ehuge
vhuge
huge
vlarge
large
medlarge
medium
medsmall
tiny vtiny
vsmall
small
msize( medium ) specify the marker size:
Text vlarge
Text medlarge Text medium
Text (^) third_tiny Text (^) quarter_tiny Text (^) minuscule
Text half_tiny
Text tiny
Text vsmall
Text medsmall Text small
mlabsize( medsmall )
specify the size of the text:
labsize( medsmall )
size( medsmall )
axis labels
marker label
vvvthick medthin vvthick thin
medium none
vthick vthin
medthick vvvthin
thick vvthin
lwidth( medthick ) specify the thickness (stroke) of a line:
mlwidth( thin )
glwidth( thin )
tlwidth( thin )
marker
grid lines
tick marks
label location relative to marker (clock position: 0 – 12)
marker label mlabposition( 5 )
POSITION
msymbol( Dh ) specify the marker symbol:
o
oh
Oh
d
dh
Dh
t
th
Th
p i
s
sh
Sh
none
format(%12.2f ) change the format of the axis labels
axis labels
nolabels no axis labels
axis labels
mlabel( foreign ) label the points with the values of the foreign variable
marker label
off turn off legend
legend
label( # "label" ) change legend label text
legend
glpattern( dash ) solid longdash longdash_dot
dot dash_dot blank
dash shortdash shortdash_dot
lpattern( dash ) grid lines
line axes specify the line pattern
tick marks tlength( 2 ) nogmin nogmax
noline axes off
nogrid
noticks
axes
grid lines
tick marks
no axis/labels
set seed
for example: scatter price mpg , xline( 20, lwidth (vthick) )
mcolor( "145 168 208 %20" )
adjust transparency by adding %#
For more info see Stata’s reference manual (stata.com)
Schemes are sets of graphical parameters, so you don’t have to specify the look of the graphs every time.
adopath ++ " ~//StataThemes " set path of the folder (StataThemes) where custom .scheme files are saved
net inst brewscheme, from("https://wbuchanan.github.io/brewscheme/") replace install William Buchanan’s package to generate custom schemes and color palettes (including ColorBrewer)
twoway scatter mpg price , scheme( customTheme )
help scheme entries see all options for setting scheme properties
Create custom themes by saving options in a .scheme file
set scheme customTheme , permanently change the theme
set as default scheme
twoway scatter mpg price , play( graphEditorTheme )
Select the Graph Editor
Click Record
Double click on symbols and areas on plot, or regions on sidebar to customize
Save theme as a .grec file
Unclick Record
1
2
3
4
5
6 7
8 9
10
0
50
100
150
200
y-axis title
0 20 40 60 80 100 x-axis title y Fitted values
subtitle
title
legend
x-axis
y-axis
y-line
y-axis title
y-axis labels
titles
marker label line
marker
tick marks
grid lines
annotation
plots contain many features
scatter price mpg , graphregion(fcolor( "192 192 192" ) ifcolor( "208 208 208" )) specify the fill of the background in RGB or with a Stata color scatter price mpg , plotregion(fcolor( "224 224 224" ) ifcolor( "240 240 240" )) specify the fill of the plot background in RGB or with a Stata color
outer region inner region
inner plot region
graph region inner graph region
plot region
graph twoway scatter y x, saving(" myPlot.gph ") replace save the graph when drawing graph save " myPlot.gph ", replace save current graph to disk
graph export " myPlot.pdf ", as(.pdf ) export the current graph as an image file
graph combine plot1.gph plot2.gph... combine 2+ saved graphs into a single plot see options to set size and resolution
Data Analysis
with Stata 15 Cheat Sheet For more info see Stata’s reference manual (stata.com)
Tim Essam ([email protected]) • Laura Hughes ([email protected]) inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) updated June 2016 geocenter.github.io/StataTraining Disclaimer: we are not affiliated with Stata. But we like it.
OPERATOR EXAMPLE i. specify indicators regress price i. rep78 specify rep78 variable to be an indicator variable ib. specify base indicator regress price ib( 3 ). rep78 set the third category of rep78 to be the base category fvset command to change base fvset base frequent rep78 set the base to most frequently occurring category for rep c. treat mpg as a continuous variable and specify an interaction between foreign and mpg
treat variable as continuous regress price i.foreign# c. mpg i.foreign
o. omit a variable or indicator regress price io( 2 ). rep78 set rep78 as an indicator; omit observations with rep78 == 2
DESCRIPTION
CATEGORICAL VARIABLES identify a group to which an observations belongs INDICATOR VARIABLES denote whether T F something is true or false
CONTINUOUS VARIABLES measure something
tsline spot plot time series of sunspots
xtset id year declare national longitudinal data to be a panel
generate lag_spot = L1.spot create a new variable of annual lags of sun spots
tsreport report time series aspects of a dataset
xtdescribe report panel aspects of a dataset xtsum hours summarize hours worked, decomposing standard deviation into between and within components
arima spot , ar( 1/2 ) estimate an auto-regressive model with 2 lags xtreg ln_w c.age##c.age ttl_exp, fe vce (robust) estimate a fixed-effects model with robust standard errors
xtline ln_wage if id <= 22 , tlabel( #3 ) plot panel data as a line plot
svydescribe report survey data details svy: mean age, over( sex ) estimate a population mean for each subpopulation
svy: tabulate sex heartatk report two-way table with tests of independence
svy, subpop( rural ): mean age estimate a population mean for rural areas
tsset time , yearly declare sunspot data to be yearly time series
svyset psuid [ pweight = finalwgt], strata (stratid) declare survey design for a dataset
svy: reg zinc c.age##c.age female weight rural estimate a regression using survey weights
stset studytime, failure( died ) declare survey design for a dataset
stsum summarize survival-time data stcox drug age estimate a Cox proportional hazard model
tscollap carryforward tsspell
compact time series into means, sums and end-of-period values carry non-missing values forward from one obs. to the next identify spells or runs in time series
USEFUL ADD-INS
pwmean mpg , over( rep78 ) pveffects mcompare( tukey ) estimate pairwise comparisons of means with equal variances include multiple comparison adjustment
anova systolic drug^ webuse^ systolic, clear analysis of variance and covariance
ttest mpg , by( foreign ) estimate t test on equality of means for mpg by foreign
tabulate foreign rep78 , chi2 exact expected tabulate foreign and repair record and return chi 2 and Fisher’s exact statistic alongside the expected values
prtest foreign == 0. one-sample test of proportions ksmirnov mpg , by( foreign ) exact Kolmogorov-Smirnov equality-of-distributions test ranksum mpg , by( foreign ) equality tests on unmatched data (independent samples)
By declaring data type, you enable Stata to apply data munging and analysis functions specific to certain data types
TIME SERIES OPERATORS L. lag x (^) t-1 L2. 2-period lag x (^) t- F. lead x (^) t+1 F2. 2-period lead x (^) t+ D. difference x (^) t-x (^) t-1 D2. difference of difference xt-xt−1-(xt−1-xt−2) S. seasonal difference x (^) t-xt-1 S2. lag-2 (seasonal difference) xt−xt−
logit foreign headroom mpg, or estimate logistic regression and report odds ratios
regress price mpg weight, vce( robust ) estimate ordinary least squares (OLS) model on mpg weight and foreign, apply robust standard errors
probit foreign turn price, vce (robust) estimate probit regression with robust standard errors
rreg price mpg weight, genwt (reg_wt) estimate robust regression to eliminate outliers
regress price mpg weight if foreign == 0, vce( cluster rep78 ) regress price only on domestic cars, cluster standard errors
bootstrap, reps( 100 ): regress mpg /* */ weight gear foreign estimate regression with bootstrapping jackknife r( mean ), double: sum mpg jackknife standard error of sample mean
Examples use auto.dta (sysuse auto, clear)
display _b[length] display _se[length] return coefficient estimate or standard error for mpg from most recent regression model margins, dydx( length ) return the estimated marginal effect for mpg margins, eyex( length ) return the estimated elasticity for price predict yhat if e( sample ) create predictions for sample on which model was fit predict double resid, residuals calculate residuals based on last fit model test headroom = 0 test linear hypotheses that headroom estimate equals zero lincom headroom - length test linear combination of estimates (headroom = length)
regress price headroom length Used in all postestimation examples
more details at http://www.stata.com/manuals/u25.pdf
pwcorr price mpg weight , star( 0.05 ) return all pairwise correlation coefficients with sig. levels
correlate mpg price return correlation or covariance matrix
mean price mpg estimates of means, including standard errors proportion rep78 foreign estimates of proportions, including standard errors for categories identified in varlist ratio estimates of ratio, including standard errors total price estimates of totals, including standard errors
ci mean mpg price , level( 99 ) compute standard errors and confidence intervals
stem mpg return stem-and-leaf display of mpg summarize price mpg , detail calculate a variety of univariate summary statistics
frequently used commands are highlighted in yellow
univar price mpg , boxplot calculate univariate summary, with box-and-whiskers plot
ssc install univar
returns e-class information when post option is used
Type help regress postestimation plots for additional diagnostic plots
estat hettest test for heteroskedasticity
vif report variance inflation factor
ovtest test for omitted variable bias
dfbeta( length ) calculate measure of influence rvfplot, yline( 0 ) plot residuals against fitted values
plot all partial- regression leverage plots in one graph
avplots
Residuals Fitted values
pricempg pricerep
priceheadroom priceweight
commands that use a fitted model
stores results as -class
r
e
r
e
Results are stored as either r -class or e -class. See Programming Cheat Sheet
r
e
r
r
r
r
r
r
e
e
e
e
0
100
200 Number of sunspots
1850 1900 1950
4 2 0
4 2 0
1970 1980 1990
id 1 id 2
4 id 3^ id 4 2 0
wage relative to inflation
Blinder-Oaxaca decomposition
ADDITIONAL MODELS
xtline plot
tsline plot
ivregress ivreg2 instrumental variables
pca principal components analysis factor factor analysis poisson • nbreg count outcomes tobit censored data
diff difference-in-difference
built-in Stata command
rd regression discontinuity xtabond xtdpdsys dynamic panel estimator teffects psmatch propensity score matching synth synthetic control analysis oaxaca
user-written ssc install ivreg
for Stata 13: ci mpg price , level ( 99 )