frequently used

commands are

highlighted in yellow

use "yourStataFile.dta", clear

load a dataset from the current directory

import delimited "yourFile.csv", /*

*/ rowrange(2:11) colrange(1:8) varnames(2)

import a .csv file

webuse set "https://github.com/GeoCenter/StataTraining/raw/master/Day2/Data"

webuse "wb_indicators_long"

set web-based directory and load data from the web

import excel "yourSpreadsheet.xlsx", /*

*/ sheet("Sheet1") cellrange(A2:H11) firstrow

import an Excel spreadsheet

Import Data

sysuse auto, clear

load system data (Auto data) for many examples, we

use the auto dataset.

display price[4]

display the 4th observation in price; only works on single values

levelsof rep78

display the unique values for rep78

Explore Data

duplicates report

finds all duplicate values in each variable

describe make price

display variable type, format,

and any value/variable labels

ds, has(type string)

lookfor "in."

search for variable types,

variable name, or variable label

isid mpg

check if mpg uniquely

identifies the data plot a histogram of the

distribution of a variable

count if price > 5000

count

number of rows (observations)

Can be combined with logic

VIEW DATA ORGANIZATION

inspect mpg

show histogram of data,

number of missing or zero

observations

summarize make price mpg

print summary statistics

(mean, stdev, min, max)

for variables

codebook make price

overview of variable type, stats,

number of missing/unique values

SEE DATA DISTRIBUTION

BROWSE OBSERVATIONS WITHIN THE DATA

gsort price mpg gsort –price –mpg

sort in order, first by price then mil es per gallon

(descending)(ascending)

list make price if price > 10000 & !missing(price) clist ...

list the make and price for observations with price > $10,000

(compact form)

open the data editor

browse Ctrl 8+

or Missing values are treated as the largest

positive number. To exclude missing values,

ask whether the value is less than "."

histogram mpg, frequency

assert price!=.

verify truth of claim

Summarize Data

bysort rep78: tabulate foreign

for each value of rep78, apply the command tabulate foreign

collapse (mean) price (max) mpg, by(foreign)

calculate mean price & max mpg by car type (foreign)

replaces data

tabstat price weight mpg, by(foreign) stat(mean sd n)

create compact table of summary statistics

table foreign, contents(mean price sd price) f(%9.2fc) row

create a flexible table of summary statistics

displays stats

for all data

formats numbers

tabulate rep78, mi gen(repairRecord)

one-way table : number of rows with each value of rep78

create binary variable for every rep78

value in a new variable, repairRecord

include missing values

tabulate rep78 foreign, mi

two-way table: cross-tabulate number of observations

for each combination of rep78 and foreign

Create New Variables

see help egen

for more options

egen meanPrice = mean(price), by(foreign)

calculate mean price for each group in foreign

pctile mpgQuartile = mpg, nq = 4

create quartiles of the mpg data

generate totRows = _N bysort rep78: gen repairTot = _N

_N creates a running count of the total observations per group

bysort rep78: gen repairIdx = _ngenerate id = _n

_n creates a running index of observations in a group

generate mpgSq = mpg^2 gen byte lowPr = price < 4000

create a new variable. Useful also for creating binar y

variables based on a condition (generate byte)

Change Data Types

destring foreignString, gen(foreignNumeric)

gen foreignNumeric = real(foreignString)

1encode foreignString, gen(foreignNumeric)"foreign"

"1"

Stata has 6 data types, and data can also be missing:

byte

true/false

int long float double

numbers

string

words

missing

no data

To convert between numbers & strings:

1decode foreign , gen(foreignString)

tostring foreign, gen(foreignString)

gen foreignString = string(foreign)

"foreign"

"1"

recast double mpg

generic way to convert between types

if foreign != 1 & price >= 10000

make

Chevy Colt

Buick Riviera

Honda Civic

Volvo 260 1 11,995

1 4,499

0 10,372

0 3,984

foreign price

Arithmetic Logic

+add (numbers)

combine (strings)

−subtract

*multiply

/divide

^raise to a power

not

!or ~

and

Basic Data Operations

if foreign != 1 | price >= 10000

make

Chevy Colt

Buick Riviera

Honda Civic

Volvo 260 1 11,995

1 4,499

0 10,372

0 3,984

foreign price

>greater than

>= greater or equal to

<= less than or equal to

<less than

equal

== tests if something is equal

= assigns a value to a variable

not

equal

Basic Syntax

All Stata commands have the same format (syntax):

bysort rep78 : summarize price if foreign == 0 & price <= 9000, detail

[byvarlist1:] command [varlist2] [=exp] [ifexp] [inrange] [weight] [usingfilename] [,options]

function: what are

you going to do

to varlists?

condition: only

apply the function

if something is true

apply to

specific rows apply

weights

save output as

a new variable pull data from a file

(if not loaded) special options

for command

apply the

command across

each unique

combination of

variables in

varlist1

column to

apply

command to

In this example, we want a detailed summary

with stats like kurtosis, plus mean and median

To find out more about any command – like what options it takes – type helpcommand

pwd

print current (working) directory

cd "C:\Program Files (x86)\Stata13"

change working directory

dir

display fi lenames in working directory

dir *.dta

List all Stata data in working directory

capture log close

close the log on any existing do fil es

log using "myDoFil e.txt", replace

create a new log file to record your work and results

Set up

search mdesc

find the package mdesc to install

ssc install mdesc

install the package mdesc; needs to be done once

packages contain

extra commands that

expand Stata’s toolkit

underlined parts

are shortcuts –

use "capture"

or "cap"

Ctrl D+

highlight text in .do file,

then ctrl + d executes it

in the command line

clear

delete data in memory

Useful Shortcuts

Ctrl 8

open the data editor

describe data

cls clear the console (where results are displayed)

PgUp PgDn scroll through previous commands

Tab autocompletes variable name after typing part

AT COMMAND PROMPT

Ctrl 9

open a new .do file

keyboard buttons

Data Processing

Cheat Sheetwith Stata 15

For more info see Stata’s reference manual (stata.com)

Tim Essam ([email protected]) • Laura Hughes ([email protected])

inspired by RStudio’s awesome Cheat Sheets (rstudio.com/resources/cheatsheets) updated June 2016

CC BY 4.0

geocenter.github.io/StataTraining

Disclaimer: we are not affiliated with Stata. But we like it.

Stata cheat sheets 1.5, Esquemas de Estatística

Documentos relacionados

Pré-visualização parcial do texto

Baixe Stata cheat sheets 1.5 e outras Esquemas em PDF para Estatística, somente na Docsity!

Import Data

Explore Data

VIEW DATA ORGANIZATION

SEE DATA DISTRIBUTION

BROWSE OBSERVATIONS WITHIN THE DATA

Summarize Data

Create New Variables

Change Data Types

− subtract

* multiply

/ divide

^ raise to a power

! or^ ~^ not

Basic Data Operations

> greater than

>= greater or equal to

<= less than or equal to

== equal < less than

Basic Syntax

Set up

Useful Shortcuts

F

AT COMMAND PROMPT

TRANSFORM STRINGS

FIND MATCHING STRINGS

GET STRING PROPERTIES

ADDING (APPENDING) NEW DATA

MERGING TWO DATASETS TOGETHER

FUZZY MATCHING: COMBINING TWO DATASETS WITHOUT A COMMON ID

MELT DATA (WIDE → LONG)

CAST DATA (LONG → WIDE)

WIDE LONG (TIDY)

TIDY DATASETS

CHANGE COLUMN NAMES

CHANGE ROW VALUES

REPLACE MISSING VALUES

FILTER SPECIFIC ROWS

SELECT SPECIFIC COLUMNS

SYMBOLS LINES / BORDERS TEXT

Text^ huge

Text^ vhuge

Text large

O

D

X

T

S

SYNTAX

SIZE / THICKNESSS

APPEARANCE

COLOR

Plotting in Stata 15

Customizing Appearance

Apply Themes

USING A SAVED THEME

USING THE GRAPH EDITOR

ANATOMY OF A PLOT

Save Plots

specify interactions regress price mpg c.mpg # c.mpg create a squared mpg term to be used in regression

specify factorial interactions regress price c.mpg ## c.mpg create all possible interactions with mpg (mpg and mpg 2 )

Declare Data

TIME SERIES webuse^ sunspot, clear^ PANEL / LONGITUDINAL webuse nlswork, clear

SURVEY DATA webuse^ nhanes2b, clear

SURVIVAL ANALYSIS webuse^ drugtr, clear

Summarize Data unless otherwise noted

Statistical Tests

Estimation with Categorical & Factor Variables

2 Diagnostics^ some are inappropriate with robust SEs

3 Postestimation

1 Estimate Models