


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The content is focused on the introduction to econometrics in specific to the cross-sectional data and panel data
Typology: Lecture notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!



o The difference between cross-sectional data and panel data Cross-sectional data, or a cross section of a study population, in statistics and econometrics is a type of one- dimensional data set. Cross-sectional data refers to data collected by observing many subjects (such as individuals, firms or countries/regions) at the same point of time, or without regard to differences in time. Analysis of cross-sectional data usually consists of comparing the differences among the subjects. For example, we want to measure current obesity levels in a population. We could draw a sample of 1,000 people randomly from that population (also known as a cross section of that population), measure their weight and height, and calculate what percentage of that sample is categorized as obese. For example, 30% of our sample were categorized as obese. This cross- sectional sample provides us with a snapshot of that population, at that one point in time. Note that we do not know based on one cross- sectional sample if obesity is increasing or decreasing; we can only describe the current proportion. Cross-sectional data differs from time series data also known as longitudinal data, which follows one subject's changes over the course of time. Another variant, panel data (or time- series cross-sectional (TSCS) data), combines both and looks at multiple subjects and how they change over the course of time. Panel analysis uses panel data to examine changes in variables over time and differences in variables between subjects. In a rolling cross-section, both the presence of an individual in the sample and the time at which the individual is included in the sample are determined randomly. For example, a political poll may decide to interview 100,000 individuals. It first selects these individuals randomly from the entire population. It then assigns a random date to each individual. This is the random date on which that individual will be interviewed, and thus included in the survey. o Why we often want to include a fixed effect component in panel data models? o Fixed effect model In statistics, a fixed effects model is a statistical model in which the model
parameters are fixed or non-random quantities. This is in contrast to random effects models and mixed models in which all or some of the model parameters are considered as random variables. In many applications including econometrics a fixed effects model refers to a regression model in which the group means are fixed (non-random) as opposed to a random effects model in which the group means are a random sample from a population. Generally, data can be grouped according to several observed factors. The group means could be modeled as fixed or random effects for each grouping. In a fixed effects model each group mean is a group-specific fixed quantity. In panel data where longitudinal observations exist for the same subject, fixed effects represent the subject-specific means. In panel data analysis the term fixed effects estimator (also known as the within estimator) is used to refer to an estimator for the coefficients in the regression model including those fixed effects (one time-invariant intercept for each subject). o First difference An alternative to the within transformation is the first difference transformation, which produces a different estimator. For t=2,…, T:yit−yi,t−1=(Xit−Xi,t−1)β+(αi−αi)+(uit−ui,t−1) ⟹Δyit=ΔXitβ+Δuit. When T=2, the first difference and fixed effects estimators are numerically equivalent. For T>2, they are not. If the error terms uit are homoskedastic with no serial correlation, the fixed effects estimator is more efficient than the first difference estimator. If uit follows a random walk, however, the first difference estimator is more efficient.[15] Equality of fixed effects and first difference estimators when T=2For the special two period case (T=2), the fixed effects (FE) estimator and the first difference (FD) estimator are numerically equivalent. This is because the FE estimator effectively "doubles the data set" used in the FD estimator. To see this, establish that the fixed effects estimator is: FET=2=[(xi1−x¯i)(xi1−x¯i)′+(xi2−x¯i)(xi2−x¯i)′]−1[(xi1−x¯i)(yi1−y¯i)+ (xi2−x¯i)(yi2−y¯i)] Since each
on which it is much more computationally efficient than the dummy variable approach. The third approach is a nested estimation whereby the local estimation for individual series is programmed in as a part of the model definition. This approach is the most computationally and memory efficient, but it requires proficient programming skills and access to the model programming code; although, it can be programmed even in SAS.Finally, each of the above alternatives can be improved if the series-specific estimation is linear (within a nonlinear model), in which case the direct linear solution for individual series can be programmed in as part of the nonlinear model definition. o Time fixed effect model Controlling for variables that are constant across entities but vary over time can be done by including time fixed effects. If there are only time fixed effects, the fixed effects regression model becomes
where only
dummies are included (B1B1 is omitted) since the model includes an intercept. This model eliminates omitted variable bias caused by excluding unobserved variables that evolve over time but are constant across entities. In some applications it is meaningful to include both entity and time fixed effects. The entity and time fixed effects model is
The combined model allows to eliminate bias from unobservables that change over time but are constant over entities and it controls for factors that differ across entities but are constant over time. Such models can be estimated using the OLS algorithm that is implemented in R.