Econometrics personal notes 2019, Lecture notes of Introduction to Econometrics

The content is focused on the introduction to econometrics in specific to the cross-sectional data and panel data

Typology: Lecture notes

2019/2020

Uploaded on 06/08/2020

petra-siouxsie
petra-siouxsie 🇨🇳

1 document

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ECONOMETRICS
o The difference between cross-sectional data and panel data
Cross-sectional data, or a cross section of a study population, in statistics
and econometrics is a type of one- dimensional data set. Cross-sectional
data refers to data collected by observing many subjects (such as
individuals, firms or countries/regions) at the same point of time, or
without regard to differences in time. Analysis of cross-sectional data
usually consists of comparing the differences among the subjects. For
example, we want to measure current obesity levels in a population. We
could draw a sample of 1,000 people randomly from that population (also
known as a cross section of that population), measure their weight and
height, and calculate what percentage of that sample is categorized as
obese. For example, 30% of our sample were categorized as obese. This
cross- sectional sample provides us with a snapshot of that population, at
that one point in time. Note that we do not know based on one cross-
sectional sample if obesity is increasing or decreasing; we can only
describe the current proportion. Cross-sectional data differs from time
series data also known as longitudinal data, which follows one subject's
changes over the course of time. Another variant, panel data (or time-
series cross-sectional (TSCS) data), combines both and looks at multiple
subjects and how they change over the course of time. Panel analysis uses
panel data to examine changes in variables over time and differences in
variables between subjects. In a rolling cross-section, both the presence of
an individual in the sample and the time at which the individual is included
in the sample are determined randomly. For example, a political poll may
decide to interview 100,000 individuals. It first selects these individuals
randomly from the entire population. It then assigns a random date to each
individual. This is the random date on which that individual will be
interviewed, and thus included in the survey.
o Why we often want to include a fixed effect component in panel data
models? o Fixed effect model
In statistics, a fixed effects model is a statistical model in which the model
pf3
pf4

Partial preview of the text

Download Econometrics personal notes 2019 and more Lecture notes Introduction to Econometrics in PDF only on Docsity!

ECONOMETRICS

o The difference between cross-sectional data and panel data Cross-sectional data, or a cross section of a study population, in statistics and econometrics is a type of one- dimensional data set. Cross-sectional data refers to data collected by observing many subjects (such as individuals, firms or countries/regions) at the same point of time, or without regard to differences in time. Analysis of cross-sectional data usually consists of comparing the differences among the subjects. For example, we want to measure current obesity levels in a population. We could draw a sample of 1,000 people randomly from that population (also known as a cross section of that population), measure their weight and height, and calculate what percentage of that sample is categorized as obese. For example, 30% of our sample were categorized as obese. This cross- sectional sample provides us with a snapshot of that population, at that one point in time. Note that we do not know based on one cross- sectional sample if obesity is increasing or decreasing; we can only describe the current proportion. Cross-sectional data differs from time series data also known as longitudinal data, which follows one subject's changes over the course of time. Another variant, panel data (or time- series cross-sectional (TSCS) data), combines both and looks at multiple subjects and how they change over the course of time. Panel analysis uses panel data to examine changes in variables over time and differences in variables between subjects. In a rolling cross-section, both the presence of an individual in the sample and the time at which the individual is included in the sample are determined randomly. For example, a political poll may decide to interview 100,000 individuals. It first selects these individuals randomly from the entire population. It then assigns a random date to each individual. This is the random date on which that individual will be interviewed, and thus included in the survey. o Why we often want to include a fixed effect component in panel data models? o Fixed effect model In statistics, a fixed effects model is a statistical model in which the model

parameters are fixed or non-random quantities. This is in contrast to random effects models and mixed models in which all or some of the model parameters are considered as random variables. In many applications including econometrics a fixed effects model refers to a regression model in which the group means are fixed (non-random) as opposed to a random effects model in which the group means are a random sample from a population. Generally, data can be grouped according to several observed factors. The group means could be modeled as fixed or random effects for each grouping. In a fixed effects model each group mean is a group-specific fixed quantity. In panel data where longitudinal observations exist for the same subject, fixed effects represent the subject-specific means. In panel data analysis the term fixed effects estimator (also known as the within estimator) is used to refer to an estimator for the coefficients in the regression model including those fixed effects (one time-invariant intercept for each subject). o First difference An alternative to the within transformation is the first difference transformation, which produces a different estimator. For t=2,…, T:yit−yi,t−1=(Xit−Xi,t−1)β+(αi−αi)+(uit−ui,t−1) ⟹Δyit=ΔXitβ+Δuit. When T=2, the first difference and fixed effects estimators are numerically equivalent. For T>2, they are not. If the error terms uit are homoskedastic with no serial correlation, the fixed effects estimator is more efficient than the first difference estimator. If uit follows a random walk, however, the first difference estimator is more efficient.[15] Equality of fixed effects and first difference estimators when T=2For the special two period case (T=2), the fixed effects (FE) estimator and the first difference (FD) estimator are numerically equivalent. This is because the FE estimator effectively "doubles the data set" used in the FD estimator. To see this, establish that the fixed effects estimator is: FET=2=[(xi1−x¯i)(xi1−x¯i)′+(xi2−x¯i)(xi2−x¯i)′]−1[(xi1−x¯i)(yi1−y¯i)+ (xi2−x¯i)(yi2−y¯i)] Since each

on which it is much more computationally efficient than the dummy variable approach. The third approach is a nested estimation whereby the local estimation for individual series is programmed in as a part of the model definition. This approach is the most computationally and memory efficient, but it requires proficient programming skills and access to the model programming code; although, it can be programmed even in SAS.Finally, each of the above alternatives can be improved if the series-specific estimation is linear (within a nonlinear model), in which case the direct linear solution for individual series can be programmed in as part of the nonlinear model definition. o Time fixed effect model Controlling for variables that are constant across entities but vary over time can be done by including time fixed effects. If there are only time fixed effects, the fixed effects regression model becomes

Yit=β0+β1Xit+δ2B2t+ … +δTBTt+uit,

Yit=β0+β1Xit+δ2B2t+…+δTBTt+uit,

where only

T−1T− 1

dummies are included (B1B1 is omitted) since the model includes an intercept. This model eliminates omitted variable bias caused by excluding unobserved variables that evolve over time but are constant across entities. In some applications it is meaningful to include both entity and time fixed effects. The entity and time fixed effects model is

Yit=β0+β1Xit+γ2D2i+ … +γnDTi+δ2B2t+ … +δTBTt+uit.

Yit=β0+β1Xit+γ2D2i+ … +γnDTi+δ2B2t+ … +δTBTt+uit.

The combined model allows to eliminate bias from unobservables that change over time but are constant over entities and it controls for factors that differ across entities but are constant over time. Such models can be estimated using the OLS algorithm that is implemented in R.