econometrics cheatsheat, Cheat Sheet of Introduction to Econometrics

the cheatsheat of econometrics

Typology: Cheat Sheet

2021/2022

Uploaded on 01/06/2024

kaicheng-lu
kaicheng-lu 🇺🇸

1 document

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Econometrics Cheat Sheet
By Marcelo Moreno - King Juan Carlos University
The Econometrics Cheat Sheet Project
Basic concepts
Definitions
Econometrics - is a social science discipline with the
objective of quantify the relationships between economic
agents, test economic theories and evaluate and implement
government and business policies.
Econometric model - is a simplified representation of the
reality to explain economic phenomena.
Ceteris paribus - if all the other relevant factors remain
constant.
Data types
Cross section - data taken at a given moment in time, an
static photo. Order doesn’t matter.
Time series - observation of variables across time. Order
does matter.
Panel data - consist of a time series for each observation
of a cross section.
Pooled cross sections - combines cross section from dif-
ferent time periods.
Phases of an econometric model
1. Specification.
2. Estimation.
3. Validation.
4. Utilization.
Regression analysis
Study and predict the mean value of a variable (dependent
variable, y) regarding the base of fixed values of other vari-
ables (independent variables, x’s). In econometrics it is
common to use Ordinary Least Squares (OLS) for regres-
sion analysis.
Correlation analysis
Correlation analysis don’t distinguish between dependent
and independent variables.
Simple correlation measures the grade of linear associa-
tion between two variables.
r=Cov(x,y)
σx·σy=Pn
i=1((xix)·(yiy))
Pn
i=1(xix)2·Pn
i=1(yiy)2
Partial correlation measures the grade of linear associa-
tion between two variables controlling a third.
Assumptions and properties
Econometric model assumptions
Under this assumptions, the OLS estimator will present
good properties. Gauss-Markov assumptions:
1. Parameters linearity (and weak dependence in time
series). ymust be a linear function of the β’s.
2. Random sampling. The sample from the population
has been randomly taken. (Only when cross section)
3. No perfect collinearity.
There are no independent variables that are constant:
Var(xj)= 0,j= 1,. . . , k .
There isn’t an exact linear relation between indepen-
dent variables.
4. Conditional mean zero and correlation zero.
a. There aren’t systematic errors: E(u|x1, . . . , xk) =
E(u)=0strong exogeneity (a implies b).
b. There are no relevant variables left out of the model:
Cov(xj, u)=0,j= 1, . . . , k weak exogeneity.
5. Homoscedasticity. The variability of the residuals is
the same for all levels of x:
Var(u|x1,. . . , xk) = σ2
u
6. No auto-correlation. Residuals don’t contain infor-
mation about any other residuals:
Corr(ut, us|x1, . . . , xk)=0,t=s.
7. Normality. Residuals are independent and identically
distributed: uN(0, σ2
u)
8. Data size. The number of observations available must
be greater than (k+ 1) parameters to estimate. (It is
already satisfied under asymptotic situations)
Asymptotic properties of OLS
Under the econometric model assumptions and the Central
Limit Theorem (CLT):
Hold 1 to 4a: OLS is unbiased. E( ˆ
βj) = βj
Hold 1 to 4: OLS is consistent. plim( ˆ
βj) = βj(to 4b
left out 4a, weak exogeneity, biased but consistent)
Hold 1 to 5: asymptotic normality of OLS (then, 7 is
necessarily satisfied): u
aN(0, σ2
u)
Hold 1 to 6: unbiased estimate of σ2
u. E(ˆσ2
u) = σ2
u
Hold 1 to 6: OLS is BLUE (Best Linear Unbiased Esti-
mator) or efficient.
Hold 1 to 7: hypothesis testing and confidence intervals
can be done reliably.
Ordinary Least Squares
Objective - minimize the Sum of Squared Residuals (SSR):
min Pn
i=1 ˆu2
i, where ˆui=yiˆyi
Simple regression model
y
x
β0
β1
Equation:
yi=β0+β1xi+ui
Estimation:
ˆyi=ˆ
β0+ˆ
β1xi
where: ˆ
β0=yˆ
β1x
ˆ
β1=Cov(y,x)
Var(x)
Multiple regression model
x2
y
x1
β0
Equation:
yi=β0+β1x1i+· · · +βkxki +ui
Estimation:
ˆyi=ˆ
β0+ˆ
β1x1i+· ·· +ˆ
βkxki
where:
ˆ
β0=yˆ
β1x1 · ·· ˆ
βkxk
ˆ
βj=Cov(y,resid xj)
Var(resid xj)
Matrix: ˆ
β= (XTX)1(XTy)
Interpretation of coefficients
Model Dependent Independent β1interpretation
Level-level y x y=β1x
Level-log ylog(x) y(β1/100)(%∆x)
Log-level log(y)x%∆y(100β1)∆x
Log-log log(y) log(x) %∆yβ1(%∆x)
Quadratic y x +x2y= (β1+ 2β2x)∆x
Error measurements
Sum of Sq. Residuals: SSR = Pn
i=1 ˆu2
i=Pn
i=1(yiˆyi)2
Explained Sum of Squares: SSE = Pn
i=1(ˆyiy)2
Total Sum of Sq.: SST = SSE + SSR = Pn
i=1(yiy)2
Standard Error of the Regression: ˆσu=qSSR
nk1
Standard Error of the ˆ
β’s: se( ˆ
β) = pˆσ2
u·(XTX)1
Mean Squared Error: MSE = Pn
i=1(yiˆyi)2
n
Absolute Mean Error: AME = Pn
i=1|yiˆyi|
n
Mean Percentage Error: MPE = Pn
i=1|ˆui/yi|
n·100
3.3-en - github.com/marcelomijas/econometrics-cheatsheet - CC-BY-4.0 license
pf3

Partial preview of the text

Download econometrics cheatsheat and more Cheat Sheet Introduction to Econometrics in PDF only on Docsity!

Econometrics Cheat Sheet

By Marcelo Moreno - King Juan Carlos University The Econometrics Cheat Sheet Project

Basic concepts

Definitions

Econometrics - is a social science discipline with the objective of quantify the relationships between economic agents, test economic theories and evaluate and implement government and business policies. Econometric model - is a simplified representation of the reality to explain economic phenomena. Ceteris paribus - if all the other relevant factors remain constant.

Data types

Cross section - data taken at a given moment in time, an static photo. Order doesn’t matter. Time series - observation of variables across time. Order does matter. Panel data - consist of a time series for each observation of a cross section. Pooled cross sections - combines cross section from dif- ferent time periods.

Phases of an econometric model

  1. Specification.
  2. Estimation.
    1. Validation.
    2. Utilization.

Regression analysis

Study and predict the mean value of a variable (dependent variable, y) regarding the base of fixed values of other vari- ables (independent variables, x’s). In econometrics it is common to use Ordinary Least Squares (OLS) for regres- sion analysis.

Correlation analysis

Correlation analysis don’t distinguish between dependent and independent variables. ˆ Simple correlation measures the grade of linear associa- tion between two variables. r = Cov( σx·x,yσy )=

Pn √P i=1((xi−x)·(yi−y)) ni=1(xi−x) (^2) ·Pni=1(yi−y) 2

ˆ Partial correlation measures the grade of linear associa- tion between two variables controlling a third.

Assumptions and properties

Econometric model assumptions

Under this assumptions, the OLS estimator will present good properties. Gauss-Markov assumptions:

  1. Parameters linearity (and weak dependence in time series). y must be a linear function of the β’s.
  2. Random sampling. The sample from the population has been randomly taken. (Only when cross section)
  3. No perfect collinearity. ˆ There are no independent variables that are constant: Var(xj ) ̸= 0, ∀j = 1,... , k. ˆ There isn’t an exact linear relation between indepen- dent variables.
  4. Conditional mean zero and correlation zero. a. There aren’t systematic errors: E(u | x 1 ,... , xk) = E(u) = 0 → strong exogeneity (a implies b). b. There are no relevant variables left out of the model: Cov(xj , u) = 0, ∀j = 1,... , k → weak exogeneity.
  5. Homoscedasticity. The variability of the residuals is the same for all levels of x: Var(u | x 1 ,... , xk) = σ u^2
  6. No auto-correlation. Residuals don’t contain infor- mation about any other residuals: Corr(ut, us | x 1 ,... , xk) = 0, ∀t ̸= s.
  7. Normality. Residuals are independent and identically distributed: u ∼ N (0, σ^2 u)
  8. Data size. The number of observations available must be greater than (k + 1) parameters to estimate. (It is already satisfied under asymptotic situations)

Asymptotic properties of OLS

Under the econometric model assumptions and the Central Limit Theorem (CLT): ˆ Hold 1 to 4a: OLS is unbiased. E( βˆj ) = βj ˆ Hold 1 to 4: OLS is consistent. plim( βˆj ) = βj (to 4b left out 4a, weak exogeneity, biased but consistent) ˆ Hold 1 to 5: asymptotic normality of OLS (then, 7 is necessarily satisfied): u ∼ a N (0, σ^2 u) ˆ Hold 1 to 6: unbiased estimate of σ u^2. E(ˆσ^2 u) = σ^2 u ˆ Hold 1 to 6: OLS is BLUE (Best Linear Unbiased Esti- mator) or efficient. ˆ Hold 1 to 7: hypothesis testing and confidence intervals can be done reliably.

Ordinary Least Squares

Objective - minimize the Sum of Squared Residuals (SSR): min

Pn i=1 uˆ

2 i , where ˆui^ =^ yi^ −^ ˆyi

Simple regression model

y

x

β 0

β 1

Equation: yi = β 0 + β 1 xi + ui Estimation: yˆi = βˆ 0 + βˆ 1 xi where: βˆ 0 = y − βˆ 1 x β^ ˆ 1 = Cov(y,x) Var(x)

Multiple regression model

x 2

y

x 1

β 0

Equation: yi = β 0 + β 1 x 1 i + · · · + βkxki + ui Estimation: yˆi = βˆ 0 + βˆ 1 x 1 i + · · · + βˆkxki where: βˆ 0 = y − βˆ 1 x 1 − · · · − βˆkxk β^ ˆj = Cov(y,resid^ xj^ ) Var(resid xj ) Matrix: βˆ = (XTX)−^1 (XTy)

Interpretation of coefficients

Model Dependent Independent β 1 interpretation Level-level y x ∆y = β 1 ∆x Level-log y log(x) ∆y ≈ (β 1 /100)(%∆x) Log-level log(y) x %∆y ≈ (100β 1 )∆x Log-log log(y) log(x) %∆y ≈ β 1 (%∆x) Quadratic y x + x^2 ∆y = (β 1 + 2β 2 x)∆x

Error measurements

Sum of Sq. Residuals: SSR =

Pn i=1 uˆ

2 i =^

Pn i=1(yi^ −^ yˆi)

2 Explained Sum of Squares: SSE =

Pn i=1(ˆyi^ −^ y)

2 Total Sum of Sq.: SST = SSE + SSR =

Pn i=1(yi^ −^ y)

2

Standard Error of the Regression: σˆu =

q SSR n−k− 1 Standard Error of the βˆ’s: se( βˆ) =

p σˆ^2 u · (XTX)−^1 Mean Squared Error: MSE =

Pn i=1(yi−yˆi)^2 n Absolute Mean Error: AME =

Pn i=1|yi−ˆyi| n Mean Percentage Error: MPE =

Pn i=1|ˆui/yi| n ·^100

R-squared

Is a measure of the goodness of the fit, how the regression fits to the data: R^2 = SSESST = 1 − SSRSST ˆ Measures the percentage of variation of y that is lin- early explained by the variations of x’s. ˆ Takes values between 0 (no linear explanation of the variations of y) and 1 (total explanation of the varia- tions of y). When the number of regressors increment, the value of the R-squared increments as well, whatever the new variables are relevant or not. To solve this problem, there is an ad- justed R-squared by degrees of freedom (or corrected R- squared):

R 2 = 1 − (^) nn−−k−^11 · SSRSST = 1 − (^) n−n−k−^11 · (1 − R^2 )

For big sample sizes: R 2 ≈ R^2

Hypothesis testing

Definitions

An hypothesis test is a rule designed to explain from a sam- ple, if exist evidence or not to reject an hypothesis that is made about one or more population parameters. Elements of an hypothesis test: ˆ Null hypothesis (H 0 ) - is the hypothesis to be tested. ˆ Alternative hypothesis (H 1 ) - is the hypothesis that cannot be rejected when the null hypothesis is rejected. ˆ Test statistic - is a random variable whose probability distribution is known under the null hypothesis. ˆ Critic value - is the value against which the test statistic is compared to determine if the null hypothesis is rejected or not. Is the value that makes the frontier between the regions of acceptance and rejection of the null hypothesis. ˆ Significance level (α) - is the probability of rejecting the null hypothesis being true (Type I Error). Is chosen by who conduct the test. Commonly is 0.10, 0.05 or 0.01. ˆ p-value - is the highest level of significance by which the null hypothesis cannot be rejected (H 0 ). The rule is: if the p-value is less than α, there is evidence to reject the null hypothesis at that given α (there is evidence to accept the alternative hypothesis).

Individual tests

Tests if a parameter is significantly different from a given value, ϑ. ˆ H 0 : βj = ϑ ˆ H 1 : βj ̸= ϑ Under H 0 : t = βˆj −ϑ se( βˆj ) ∼^ tn−k−^1 ,α/^2 If |t| > |tn−k− 1 ,α/ 2 |, there is evidence to reject H 0. Individual significance test - tests if a parameter is sig- nificantly different from zero. ˆ H 0 : βj = 0 ˆ H 1 : βj ̸= 0 Under H 0 : t = βˆj se( βˆj ) ∼^ tn−k−^1 ,α/^2 If |t| > |tn−k− 1 ,α/ 2 |, there is evidence to reject H 0.

The F test

Simultaneously tests multiple (linear) hypothesis about the parameters. It makes use of a non restricted model and a restricted model: ˆ Non restricted model - is the model on which we want to test the hypothesis. ˆ Restricted model - is the model on which the hypoth- esis that we want to test have been imposed. Then, looking at the errors, there are: ˆ SSRUR - is the SSR of the non restricted model. ˆ SSRR - is the SSR of the restricted model. Under H 0 : F = SSR SSRR−SSRUR UR· n−k q −^1 ∼ Fq,n−k− 1 where k is the number of parameters of the non restricted model and q is the number of linear hypothesis tested. If Fq,n−k− 1 < F , there is evidence to reject H 0. Global significance test - tests if all the parameters as- sociated to x’s are simultaneously equal to zero. ˆ H 0 : β 1 = β 2 = · · · = βk = 0 ˆ H 1 : β 1 ̸= 0 and/or β 2 ̸= 0... and/or βk ̸= 0 In this case, we can simplify the formula for the F statistic. Under H 0 : F = R

2 1 −R^2 ·^

n−k− 1 k ∼^ Fk,n−k−^1 If Fk,n−k− 1 < F , there is evidence to reject H 0.

Confidence intervals

The confidence intervals at (1 − α) confidence level can be calculated: βˆj ∓ tn−k− 1 ,α/ 2 · se( βˆj )

Dummy variables

Dummy (or binary) variables are used for qualitative infor- mation like sex, civil state, country, etc. ˆ Takes the value 1 in a given category and 0 in the rest. ˆ Are used to analyze and modeling structural changes in the model parameters. If a qualitative variable have m categories, we only have to include (m − 1) dummy variables.

Structural change

Structural change refers to changes in the values of the pa- rameters of the econometric model produced by the effect of different sub-populations. Structural change can be in- cluded in the model through dummy variables. The location of the dummy variables (D) matters: ˆ On the intercept (additive effect) - represents the mean difference between the values produced by the structural change. y = β 0 + δ 1 D + β 1 x 1 + u ˆ On the slope (multiplicative effect) - represents the ef- fect (slope) difference between the values produced by the structural change. y = β 0 + β 1 x 1 + δ 1 D · x 1 + u Chow’s structural test - is used when we want to analyze the existence of structural changes in all the model param- eters, it’s a particular expression of the F test, where the null hypothesis is: H 0 : No structural change (all δ = 0).

Changes of scale

Changes in the measurement units of the variables: ˆ In the endogenous variable, y∗^ = y ·λ - affects all model parameters, β j∗ = βj · λ, ∀j = 1,... , k ˆ In an exogenous variable, x∗ j = xj · λ - only affect the parameter linked to said exogenous variable, β j∗ = βj · λ ˆ Same scale change on endogenous and exogenous - only affects the intercept, β∗ 0 = β 0 · λ

Changes of origin

Changes in the measurement origin of the variables (en- dogenous or exogenous), y∗^ = y + λ - only affects the model’s intercept, β 0 ∗ = β 0 + λ