Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Pooled and Panel Data Analysis: Understanding the Differences and Advantages, Lecture notes of Advanced Data Analysis

University of Portsmouth Advanced Data Analysis

An overview of pooled and panel data analysis, discussing the differences between cross-sectional and time series data, the advantages of panel data, and various types of panel data designs. It also covers regression analysis with pooled data and the concept of unobserved heterogeneity. examples using Stata, R, and Python.

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

sctsh3 🇬🇧

4.8

(6)

294 documents

1 / 19

This page cannot be seen from the preview

Don't miss anything!

Pooled&and&Panel&Data&Analysis

Topics

Pooled Data

Fixed Effects – Binary Variables

Fixed Effects – Within Transformation

Reference

Baltagi, B. Econometric analysis of panel data. Third Edition. John Wiley

& Sons. 2005, Chapters 1-4.

Wooldridge, J. M. 2001. Econometric analysis of cross section and panel

data. Cap. 10.

Panel Data Econometrics

Prof. Alexandre Gori Maia

State University of Campinas

Discover Lecture notes of Advanced Data Analysis University of Portsmouth

Partial preview of the text

Download Pooled and Panel Data Analysis: Understanding the Differences and Advantages and more Lecture notes Advanced Data Analysis in PDF only on Docsity!

Pooled and Panel Data Analysis

1 Topics Pooled Data Fixed Effects – Binary Variables Fixed Effects – Within Transformation Reference Baltagi, B. Econometric analysis of panel data. Third Edition. John Wiley & Sons. 2005, Chapters 1-4. Wooldridge, J. M. 2001. Econometric analysis of cross section and panel data. Cap. 10.

Panel Data Econometrics

Prof. Alexandre Gori Maia

State University of Campinas

Cross-Section al data i

Y

i = 1 , 2 ,..., n

Y 1

Y 2

Y

Time Series

Y t

t = 1 , 2 ,..., T^1

Y

Pooled Data it

Y

i = 1 , 2 ,..., n

Y

n 11

Y

Panel Data it

Y

t = 1 , 2 ,..., T

Y

n 22

Y

1 T

Y

2 nT T

Y

i = 1 , 2 ,..., n

t = 1 , 2 ,..., T

Y

n 1

Y

n 2

Y

1 T

Y

2 nT

Y

Different units in a specific period of time The same unit in different periods of time Cross-sectional samples (not necessarily the same) are observed in different periods of time The same cross— sectional sample is observed in different periods of time

Sample Designs

Assumes that the relation between Y and X is the same in both periods t =0 and 1. Y X Y^ Constant intercept and slope coefficients X Y X

Y = a+ b X + e

t= t= t= t= t= t= Assume that Y varies in time but the relation between Y and X remains constant. Different intercepts and constant slope coefficients

Y = a + b X + d t + e

Both the intercept and the marginal impact of X on Y change over time. Different intercepts and slope coefficients

Y = a + b X + d t + q( t ´ X )+ e

Regression with Pooled Data 4

Pooled Data - Definition

Pooled data presents some main advantages when comparted to

cross-sectional data: i) larger sample size; ii) allows us to identify

changes in the relation over time;

If we assume that the relation is the same over time:
If we assume that the expected value of Y varies over time and the

relation between Y and X remains constant:

If we assume changes in both the expected value of Y and in the

relation between Y and X over time:

j i k j j

Y = +å X + e

= 1 0 b b j i k j

Y = +å j X + t + e

b b d

1 0 j i k j j j k j

Y = +å j X + t +å X ´ t + e

= 1 = 1

b 0 b d q

Example – Python

The equivalent in Python:

Exercise

1) The dataset Data_AgricultureClimate.csv contains

information on agricultural production and climate change

in São Paulo, Brazil (GORI MAIA, A., MIYAMOTO, B. C,

GARCIA, J. R. Climate change and agriculture: Do

environmental preservation and ecossystem services

matter? Ecoloogical Economics, v. 152 (October 2018),

a) Develop a regression model for pooled data to analyze the relation between the (log of) production value, (log of) area, temperature and precipitation; b) Consider changes in the relation before and after 2005 (variable periodo );

Controlling for Unobersvables

10 A =2 A =2^ A =4^ A =4 A =6^ A = Y =2000 Y =2200 Y =4000 Y =4000^ Y =6200^ Y = X =2 X =4 X =6 X =8^ X =10^ X =

Suppose that each farm ( i =1,2,3) is observed in two distinct periods (t=0,1);
If we assume that the land size A is different between the farms but constant over time, we can control the effect of land size on Y by using binary variables to identify each farm (for example, D 1=1 para i =1, D 2= para i =2, farm 3 is the reference);
In other words, although land size A is non-observable, we can control its effect on Y by including a component c , in our model, called unobserved heterogeneity. i =1 i =1 i =2 i =2 i =3 i = t =0 t =1 t =0^ t =1^ t =0^ t = A = A = A = X Y D 2= D 1=0; D 2= D 1= D1 =1; D 2= D1 =0; D 2=1 D1 =0; D 2=

Where c is an unobserved component, also called unobserved effect or unobserved heterogeneity. One main assumption in the panel data analysis is that the component c is constant over time. This means: E ( y | x , c )= xβ + c

Assume that the relation between y and x ≡ ( X 1 , X 2 , ..., X k) is given by:
When c isn’t correlated to the independent variables – Cov( Xj , c )=0 – then the omission of c in our model will not generate any kind of bias (omitted variable bias). In this case, we could apply OLS using models for pooled data ( pooled regression ). However, if Cov( Xj , c )≠0, the the pooled regression estimates are biased even for large samples. Where E ( eit | xit , ci ) = 0

Unobserved Heterogeneity

11 it it i it

Y = x β + c + e

One main limitation of the fixed effects estimator with binary variable is that the number of binary variables may be quite large. Most estimates tend to be insignificant if the sample is not large enough to compensate the lost degrees of freedoms.
Alternatively, through an algebraic transformation, we can estimate the same coefficients using the within estimators.

Within Transformation

13 ( Y (^) it - Yi )=( x (^) it - x i ) β +( ci - ci )+( eit - ei ) Yit^ it eit ~ (^) ~ ~ = x β + Yit = x it β + ci + e it Suppose the model with unobserved heterogeneity: This relation is also valid for the average values of each cross-sectional unit: Yi = x i β + ci + e i Subtracting the equations, we have: Since ci is constant over time, its average is the same than ci. Yij ~ x ij ~ eij ~

Example – Stata & R

Suppose we have a panel with information for the regressand

y and two exogenous variables ( x 1 and x 2) across n cross-

sectional units (variable cs =1.. n ) and T periods (variable

time =1.. T ). The within estimator is given in Stata by:

The equivalent in R:

The model with controls for the heterogeneity across cross-sectional units ( ci ) is also called one-way model:

Two-Way Fixed Effects Estimator

16 i T T it k it (^) j j j Y X c ct P ct P e it t t

= +å + + + + +

= 2 2 ... 1

a b

Where Pji =1 if j = t , Pji =0 if j ≠ t.

We can extend this idea, using binary variables to control for the heterogeneity across periods t. The two-way model is: i it k it t j j j Y X c e it

= +å + +

= 1

a b

Example – Stata, R & Python

The two-way estimator in Stata:
The equivalent in R:
The equivalent in Python:

Exercise

1) The dataset Data_AgricultureClimate.csv contains

information on agricultural production and climate variables

in the state of São Paulo (GORI MAIA, A., MIYAMOTO, B. C,

GARCIA, J. R. Climate change and agriculture: Do

environmental preservation and ecossystem services

matter? Ecoloogical Economics, v. 152 (October 2018),

a) Analyze the relation between the (log) value of agricultural production, (log) area, temperature and precipitation using the one-way fixed-effects estimators; b) Now use two-way fixed-effects estimators, identifying the main differences in relation to (a);

Pooled and Panel Data Analysis: Understanding the Differences and Advantages, Lecture notes of Advanced Data Analysis

Related documents

Partial preview of the text

Download Pooled and Panel Data Analysis: Understanding the Differences and Advantages and more Lecture notes Advanced Data Analysis in PDF only on Docsity!

Pooled and Panel Data Analysis

Panel Data Econometrics

Prof. Alexandre Gori Maia

State University of Campinas

Y

i = 1 , 2 ,..., n

Y 1

Y 2

Y

Y t

t = 1 , 2 ,..., T^1

Y

Y

Y

Y

i = 1 , 2 ,..., n

Y

Y

Y

Y

t = 1 , 2 ,..., T

Y

Y

Y

Y

Y

Y

i = 1 , 2 ,..., n

t = 1 , 2 ,..., T

Y

Y

Y

Y

Y

Y

Y

Y

Y

Sample Designs

Y = a+ b X + e

Y = a + b X + d t + e

Y = a + b X + d t + q( t ´ X )+ e

Pooled Data - Definition

cross-sectional data: i) larger sample size; ii) allows us to identify

changes in the relation over time;

relation between Y and X remains constant:

relation between Y and X over time:

Y = +å X + e

Y = +å j X + t + e

b b d

Y = +å j X + t +å X ´ t + e

b 0 b d q

Example – Python

Exercise

1) The dataset Data_AgricultureClimate.csv contains

information on agricultural production and climate change

in São Paulo, Brazil (GORI MAIA, A., MIYAMOTO, B. C,

GARCIA, J. R. Climate change and agriculture: Do

environmental preservation and ecossystem services

matter? Ecoloogical Economics, v. 152 (October 2018),

Controlling for Unobersvables

Unobserved Heterogeneity

Y = x β + c + e

Within Transformation

Example – Stata & R

y and two exogenous variables ( x 1 and x 2) across n cross-

sectional units (variable cs =1.. n ) and T periods (variable

time =1.. T ). The within estimator is given in Stata by:

Two-Way Fixed Effects Estimator

= +å + + + + +

a b

= +å + +

a b

Example – Stata, R & Python

Exercise

1) The dataset Data_AgricultureClimate.csv contains