Pooled and Panel Data Analysis: Understanding the Differences and Advantages, Lecture notes of Advanced Data Analysis

An overview of pooled and panel data analysis, discussing the differences between cross-sectional and time series data, the advantages of panel data, and various types of panel data designs. It also covers regression analysis with pooled data and the concept of unobserved heterogeneity. examples using Stata, R, and Python.

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

sctsh3
sctsh3 🇬🇧

4.8

(6)

294 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Pooled&and&Panel&Data&Analysis
1
Topics
Pooled Data
Fixed Effects – Binary Variables
Fixed Effects – Within Transformation
Reference
Baltagi, B. Econometric analysis of panel data. Third Edition. John Wiley
& Sons. 2005, Chapters 1-4.
Wooldridge, J. M. 2001. Econometric analysis of cross section and panel
data. Cap. 10.
Panel Data Econometrics
Prof. Alexandre Gori Maia
State University of Campinas
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Pooled and Panel Data Analysis: Understanding the Differences and Advantages and more Lecture notes Advanced Data Analysis in PDF only on Docsity!

Pooled and Panel Data Analysis

1 Topics Pooled Data Fixed Effects – Binary Variables Fixed Effects – Within Transformation Reference Baltagi, B. Econometric analysis of panel data. Third Edition. John Wiley & Sons. 2005, Chapters 1-4. Wooldridge, J. M. 2001. Econometric analysis of cross section and panel data. Cap. 10.

Panel Data Econometrics

Prof. Alexandre Gori Maia

State University of Campinas

Cross-Section al data i

Y

i = 1 , 2 ,..., n

Y 1

Y 2

n

Y

Time Series

Y t

t = 1 , 2 ,..., T^1

Y

2

Y

T

Y

Pooled Data it

Y

T

i = 1 , 2 ,..., n

11

Y

21

Y

n 11

Y

Panel Data it

Y

t = 1 , 2 ,..., T

12

Y

22

Y

n 22

Y

T

Y

1 T

Y

2 nT T

Y

i = 1 , 2 ,..., n

t = 1 , 2 ,..., T

11

Y

21

Y

n 1

Y

12

Y

22

Y

n 2

Y

T

Y

1 T

Y

2 nT

Y

Different units in a specific period of time The same unit in different periods of time Cross-sectional samples (not necessarily the same) are observed in different periods of time The same cross— sectional sample is observed in different periods of time

Sample Designs

2

Assumes that the relation between Y and X is the same in both periods t =0 and 1. Y X Y^ Constant intercept and slope coefficients X Y X

Y = a+ b X + e

t= t= t= t= t= t= Assume that Y varies in time but the relation between Y and X remains constant. Different intercepts and constant slope coefficients

Y = a + b X + d t + e

Both the intercept and the marginal impact of X on Y change over time. Different intercepts and slope coefficients

Y = a + b X + d t + q( t ´ X )+ e

Regression with Pooled Data 4

Pooled Data - Definition

5

  • Pooled data presents some main advantages when comparted to
cross-sectional data: i) larger sample size; ii) allows us to identify
changes in the relation over time;
  • If we assume that the relation is the same over time:
  • If we assume that the expected value of Y varies over time and the
relation between Y and X remains constant:
  • If we assume changes in both the expected value of Y and in the
relation between Y and X over time:

j i k j j

Y = +å X + e

= 1 0 b b j i k j

Y = +å j X + t + e

=

b b d

1 0 j i k j j j k j

Y = +å j X + t +å X ´ t + e

= 1 = 1

b 0 b d q

Example – Python

7

  • The equivalent in Python:

Exercise

8

1) The dataset Data_AgricultureClimate.csv contains
information on agricultural production and climate change
in São Paulo, Brazil (GORI MAIA, A., MIYAMOTO, B. C,
GARCIA, J. R. Climate change and agriculture: Do
environmental preservation and ecossystem services
matter? Ecoloogical Economics, v. 152 (October 2018),

a) Develop a regression model for pooled data to analyze the relation between the (log of) production value, (log of) area, temperature and precipitation; b) Consider changes in the relation before and after 2005 (variable periodo );

Controlling for Unobersvables

10 A =2 A =2^ A =4^ A =4 A =6^ A = Y =2000 Y =2200 Y =4000 Y =4000^ Y =6200^ Y = X =2 X =4 X =6 X =8^ X =10^ X =

  • Suppose that each farm ( i =1,2,3) is observed in two distinct periods (t=0,1);
  • If we assume that the land size A is different between the farms but constant over time, we can control the effect of land size on Y by using binary variables to identify each farm (for example, D 1=1 para i =1, D 2= para i =2, farm 3 is the reference);
  • In other words, although land size A is non-observable, we can control its effect on Y by including a component c , in our model, called unobserved heterogeneity. i =1 i =1 i =2 i =2 i =3 i = t =0 t =1 t =0^ t =1^ t =0^ t = A = A = A = X Y D 2= D 1=0; D 2= D 1= D1 =1; D 2= D1 =0; D 2=1 D1 =0; D 2=

Where c is an unobserved component, also called unobserved effect or unobserved heterogeneity. One main assumption in the panel data analysis is that the component c is constant over time. This means: E ( y | x , c )= + c

  • Assume that the relation between y and x ≡ ( X 1 , X 2 , ..., X k) is given by:
  • When c isn’t correlated to the independent variables – Cov( Xj , c )=0 – then the omission of c in our model will not generate any kind of bias (omitted variable bias). In this case, we could apply OLS using models for pooled data ( pooled regression ). However, if Cov( Xj , c )≠0, the the pooled regression estimates are biased even for large samples. Where E ( eit | xit , ci ) = 0

Unobserved Heterogeneity

11 it it i it

Y = x β + c + e

  • One main limitation of the fixed effects estimator with binary variable is that the number of binary variables may be quite large. Most estimates tend to be insignificant if the sample is not large enough to compensate the lost degrees of freedoms.
  • Alternatively, through an algebraic transformation, we can estimate the same coefficients using the within estimators.

Within Transformation

13 ( Y (^) it - Yi )=( x (^) it - x i ) β +( ci - ci )+( eit - ei ) Yit^ it eit ~ (^) ~ ~ = x β + Yit = x it β + ci + e it Suppose the model with unobserved heterogeneity: This relation is also valid for the average values of each cross-sectional unit: Yi = x i β + ci + e i Subtracting the equations, we have: Since ci is constant over time, its average is the same than ci. Yij ~ x ij ~ eij ~

Example – Stata & R

14

  • Suppose we have a panel with information for the regressand
y and two exogenous variables ( x 1 and x 2) across n cross-
sectional units (variable cs =1.. n ) and T periods (variable
time =1.. T ). The within estimator is given in Stata by:
  • The equivalent in R:
  • The model with controls for the heterogeneity across cross-sectional units ( ci ) is also called one-way model:

Two-Way Fixed Effects Estimator

16 i T T it k it (^) j j j Y X c ct P ct P e it t t

= +å + + + + +

= 2 2 ... 1

a b

Where Pji =1 if j = t , Pji =0 if jt.

  • We can extend this idea, using binary variables to control for the heterogeneity across periods t. The two-way model is: i it k it t j j j Y X c e it

= +å + +

= 1

a b

Example – Stata, R & Python

17

  • The two-way estimator in Stata:
  • The equivalent in R:
  • The equivalent in Python:

Exercise

19

1) The dataset Data_AgricultureClimate.csv contains
information on agricultural production and climate variables
in the state of São Paulo (GORI MAIA, A., MIYAMOTO, B. C,
GARCIA, J. R. Climate change and agriculture: Do
environmental preservation and ecossystem services
matter? Ecoloogical Economics, v. 152 (October 2018),

a) Analyze the relation between the (log) value of agricultural production, (log) area, temperature and precipitation using the one-way fixed-effects estimators; b) Now use two-way fixed-effects estimators, identifying the main differences in relation to (a);