Download Pooled and Panel Data Analysis: Understanding the Differences and Advantages and more Lecture notes Advanced Data Analysis in PDF only on Docsity!
Pooled and Panel Data Analysis
1 Topics Pooled Data Fixed Effects – Binary Variables Fixed Effects – Within Transformation Reference Baltagi, B. Econometric analysis of panel data. Third Edition. John Wiley & Sons. 2005, Chapters 1-4. Wooldridge, J. M. 2001. Econometric analysis of cross section and panel data. Cap. 10.
Panel Data Econometrics
Prof. Alexandre Gori Maia
State University of Campinas
Cross-Section al data i
Y
i = 1 , 2 ,..., n
Y 1
Y 2
n
Y
Time Series
Y t
t = 1 , 2 ,..., T^1
Y
2
Y
T
Y
Pooled Data it
Y
T
i = 1 , 2 ,..., n
11
Y
21
Y
n 11
Y
Panel Data it
Y
t = 1 , 2 ,..., T
12
Y
22
Y
n 22
Y
T
Y
1 T
Y
2 nT T
Y
i = 1 , 2 ,..., n
t = 1 , 2 ,..., T
11
Y
21
Y
n 1
Y
12
Y
22
Y
n 2
Y
T
Y
1 T
Y
2 nT
Y
Different units in a specific period of time The same unit in different periods of time Cross-sectional samples (not necessarily the same) are observed in different periods of time The same cross— sectional sample is observed in different periods of time
Sample Designs
2
Assumes that the relation between Y and X is the same in both periods t =0 and 1. Y X Y^ Constant intercept and slope coefficients X Y X
Y = a+ b X + e
t= t= t= t= t= t= Assume that Y varies in time but the relation between Y and X remains constant. Different intercepts and constant slope coefficients
Y = a + b X + d t + e
Both the intercept and the marginal impact of X on Y change over time. Different intercepts and slope coefficients
Y = a + b X + d t + q( t ´ X )+ e
Regression with Pooled Data 4
Pooled Data - Definition
5
- Pooled data presents some main advantages when comparted to
cross-sectional data: i) larger sample size; ii) allows us to identify
changes in the relation over time;
- If we assume that the relation is the same over time:
- If we assume that the expected value of Y varies over time and the
relation between Y and X remains constant:
- If we assume changes in both the expected value of Y and in the
relation between Y and X over time:
j i k j j
Y = +å X + e
= 1 0 b b j i k j
Y = +å j X + t + e
=
b b d
1 0 j i k j j j k j
Y = +å j X + t +å X ´ t + e
= 1 = 1
b 0 b d q
Example – Python
7
- The equivalent in Python:
Exercise
8
1) The dataset Data_AgricultureClimate.csv contains
information on agricultural production and climate change
in São Paulo, Brazil (GORI MAIA, A., MIYAMOTO, B. C,
GARCIA, J. R. Climate change and agriculture: Do
environmental preservation and ecossystem services
matter? Ecoloogical Economics, v. 152 (October 2018),
a) Develop a regression model for pooled data to analyze the relation between the (log of) production value, (log of) area, temperature and precipitation; b) Consider changes in the relation before and after 2005 (variable periodo );
Controlling for Unobersvables
10 A =2 A =2^ A =4^ A =4 A =6^ A = Y =2000 Y =2200 Y =4000 Y =4000^ Y =6200^ Y = X =2 X =4 X =6 X =8^ X =10^ X =
- Suppose that each farm ( i =1,2,3) is observed in two distinct periods (t=0,1);
- If we assume that the land size A is different between the farms but constant over time, we can control the effect of land size on Y by using binary variables to identify each farm (for example, D 1=1 para i =1, D 2= para i =2, farm 3 is the reference);
- In other words, although land size A is non-observable, we can control its effect on Y by including a component c , in our model, called unobserved heterogeneity. i =1 i =1 i =2 i =2 i =3 i = t =0 t =1 t =0^ t =1^ t =0^ t = A = A = A = X Y D 2= D 1=0; D 2= D 1= D1 =1; D 2= D1 =0; D 2=1 D1 =0; D 2=
Where c is an unobserved component, also called unobserved effect or unobserved heterogeneity. One main assumption in the panel data analysis is that the component c is constant over time. This means: E ( y | x , c )= xβ + c
- Assume that the relation between y and x ≡ ( X 1 , X 2 , ..., X k) is given by:
- When c isn’t correlated to the independent variables – Cov( Xj , c )=0 – then the omission of c in our model will not generate any kind of bias (omitted variable bias). In this case, we could apply OLS using models for pooled data ( pooled regression ). However, if Cov( Xj , c )≠0, the the pooled regression estimates are biased even for large samples. Where E ( eit | xit , ci ) = 0
Unobserved Heterogeneity
11 it it i it
Y = x β + c + e
- One main limitation of the fixed effects estimator with binary variable is that the number of binary variables may be quite large. Most estimates tend to be insignificant if the sample is not large enough to compensate the lost degrees of freedoms.
- Alternatively, through an algebraic transformation, we can estimate the same coefficients using the within estimators.
Within Transformation
13 ( Y (^) it - Yi )=( x (^) it - x i ) β +( ci - ci )+( eit - ei ) Yit^ it eit ~ (^) ~ ~ = x β + Yit = x it β + ci + e it Suppose the model with unobserved heterogeneity: This relation is also valid for the average values of each cross-sectional unit: Yi = x i β + ci + e i Subtracting the equations, we have: Since ci is constant over time, its average is the same than ci. Yij ~ x ij ~ eij ~
Example – Stata & R
14
- Suppose we have a panel with information for the regressand
y and two exogenous variables ( x 1 and x 2) across n cross-
sectional units (variable cs =1.. n ) and T periods (variable
time =1.. T ). The within estimator is given in Stata by:
- The model with controls for the heterogeneity across cross-sectional units ( ci ) is also called one-way model:
Two-Way Fixed Effects Estimator
16 i T T it k it (^) j j j Y X c ct P ct P e it t t
= +å + + + + +
= 2 2 ... 1
a b
Where Pji =1 if j = t , Pji =0 if j ≠ t.
- We can extend this idea, using binary variables to control for the heterogeneity across periods t. The two-way model is: i it k it t j j j Y X c e it
= +å + +
= 1
a b
Example – Stata, R & Python
17
- The two-way estimator in Stata:
- The equivalent in R:
- The equivalent in Python:
Exercise
19
1) The dataset Data_AgricultureClimate.csv contains
information on agricultural production and climate variables
in the state of São Paulo (GORI MAIA, A., MIYAMOTO, B. C,
GARCIA, J. R. Climate change and agriculture: Do
environmental preservation and ecossystem services
matter? Ecoloogical Economics, v. 152 (October 2018),
a) Analyze the relation between the (log) value of agricultural production, (log) area, temperature and precipitation using the one-way fixed-effects estimators; b) Now use two-way fixed-effects estimators, identifying the main differences in relation to (a);