Understanding Dummy Variables in Regression Analysis, Study notes of Economic Analysis

The concept of dummy variables in regression analysis, their interpretation, and how to use them as regressors or the dependent variable. It also covers the chow test and its significance in testing the null hypothesis of no difference between groups. The document also discusses the difference between numerical and ordinal variables and how to transform them into dummy variables for regression analysis.

Typology: Study notes

2021/2022

Uploaded on 09/12/2022

riciard
riciard ๐Ÿ‡ฌ๐Ÿ‡ง

4.4

(7)

233 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Chapter 7, Dummy Variable
Dummy variable can only take values 1 and 0. It is categorical, that means the numbers 1 and 0 have no numerical
meanings (we cannot say 1 is greater than 0). In this chapter we use dummy as regressor. Chapter 17 (covered in eco411)
shows how to use dummy as the dependent variable. First letโ€™s use wage data and consider a simple regression
๐‘ค๐‘Ž๐‘”๐‘’ = ๐›ฝ0+ ๐›ฝ1๐ท + ๐‘ข (1)
where ๐ท = 0 for male, and ๐ท = 1 for female. For dummy variable, you have to be clear ๐ท = 0 is for which group (called
base group). Later all comparisons are made relative to the base group. You can report the frequency of ๐ท using tab D.
The key to understand the dummy-variable-model is to discuss:
when = 0, ๐‘ค๐‘Ž๐‘”๐‘’ = _____________________. If we take expectation we get ___________________
when = 1, ๐‘ค๐‘Ž๐‘”๐‘’ = _____________________. If we take expectation we get ___________________
So ๐›ฝ0 can be interpreted as ________________________; and ๐›ฝ1 can be interpreted as____________________
This result suggests that we can conduct the two-sample t test (the comparison of means test, stata command:
ttest wage, by(D)) using the simple regression (1) that involves dummy. Now consider a multiple regression
๐‘ค๐‘Ž๐‘”๐‘’ = ๐›ฝ0+ ๐›ฝ1๐ท + ๐›ฝ2๐‘ฅ + ๐›ฝ3(๐ท โˆ— ๐‘ฅ)+ ๐‘ข (2)
For example x can be exper, and ๐ท โˆ— ๐‘ฅ is the interaction term (product of) x and dummy. Letโ€™s discuss again:
when ๐ท = 0, _________________________________________________________________________________________
when ๐ท = 1, _________________________________________________________________________________________
๐›ฝ0 can be interpreted as _________________________________________________________;
๐›ฝ1 can be interpreted as _________________________________________________________;
๐›ฝ2 can be interpreted as _________________________________________________________;
๐›ฝ3 can be interpreted as _________________________________________________________;
How to show ๐›ฝ1 and ๐›ฝ3 in graph?
pf2

Partial preview of the text

Download Understanding Dummy Variables in Regression Analysis and more Study notes Economic Analysis in PDF only on Docsity!

Chapter 7, Dummy Variable Dummy variable can only take values 1 and 0. It is categorical, that means the numbers 1 and 0 have no numerical meanings (we cannot say 1 is greater than 0). In this chapter we use dummy as regressor. Chapter 17 (covered in eco411) shows how to use dummy as the dependent variable. First letโ€™s use wage data and consider a simple regression ๐‘ค๐‘Ž๐‘”๐‘’ = ๐›ฝ 0 + ๐›ฝ 1 ๐ท + ๐‘ข (1) where ๐ท = 0 for male, and ๐ท = 1 for female. For dummy variable, you have to be clear ๐ท = 0 is for which group (called base group). Later all comparisons are made relative to the base group. You can report the frequency of ๐ท using tab D. The key to understand the dummy-variable-model is to discuss: when = 0, ๐‘ค๐‘Ž๐‘”๐‘’ = _____________________. If we take expectation we get ___________________ when = 1, ๐‘ค๐‘Ž๐‘”๐‘’ = _____________________. If we take expectation we get ___________________ So ๐›ฝ 0 can be interpreted as ________________________; and ๐›ฝ 1 can be interpreted as____________________ This result suggests that we can conduct the two-sample t test (the comparison of means test, stata command: ttest wage, by(D)) using the simple regression (1) that involves dummy. Now consider a multiple regression ๐‘ค๐‘Ž๐‘”๐‘’ = ๐›ฝ 0 + ๐›ฝ 1 ๐ท + ๐›ฝ 2 ๐‘ฅ + ๐›ฝ 3 (๐ท โˆ— ๐‘ฅ) + ๐‘ข (2) For example x can be exper, and ๐ท โˆ— ๐‘ฅ is the interaction term (product of) x and dummy. Letโ€™s discuss again: when ๐ท = 0, _________________________________________________________________________________________ when ๐ท = 1, _________________________________________________________________________________________ ๐›ฝ 0 can be interpreted as _________________________________________________________; ๐›ฝ 1 can be interpreted as _________________________________________________________; ๐›ฝ 2 can be interpreted as _________________________________________________________; ๐›ฝ 3 can be interpreted as _________________________________________________________; How to show ๐›ฝ 1 and ๐›ฝ 3 in graph?

In this context there is a very important F test, called Chow Test, which is concerned with a particular null hypothesis ๐ป0: ๐›ฝ 1 = 0, ๐›ฝ 3 = 0 (3) The meaning of this null hypothesis is _____________________________________ The restricted regression is ______________________________________________ The Chow test is ______________________________________________________ What should we do if the null hypothesis is not rejected? What should we do if the null hypothesis is rejected? When the x variable itself is a dummy, then the regression becomes very interesting. In this case, the ๐›ฝ 3 in the below regression is called difference-in-difference estimator (the coefficient of interaction term of two dummy). ๐‘Œ = ๐›ฝ 0 + ๐›ฝ 1 ๐ท 1 + ๐›ฝ 2 ๐ท 2 + ๐›ฝ 3 (๐ท 1 โˆ— ๐ท 2 ) + ๐‘ข (4) Exercise: How to interpret ๐›ฝ 3? To fix idea, let y be wage, ๐ท 1 be female dummy, and ๐ท 2 be married dummy. Letโ€™s discuss:

Some variables look like numerical, but they are not. Two examples: ๐‘ฅ1 = 1 ๐‘–๐‘“ ๐‘ก๐‘Ž๐‘˜๐‘–๐‘›๐‘” ๐‘๐‘ข๐‘ ; 2 ๐‘–๐‘“ ๐‘ก๐‘Ž๐‘˜๐‘–๐‘›๐‘” ๐‘ ๐‘ข๐‘๐‘ค๐‘Ž๐‘ฆ; 3 ๐‘–๐‘“ ๐‘‘๐‘Ÿ๐‘–๐‘ฃ๐‘–๐‘›๐‘” ๐‘ฅ2 = 1 ๐‘–๐‘“ ๐‘“๐‘Ž๐‘–๐‘™๐‘–๐‘›๐‘” ๐‘’๐‘ฅ๐‘๐‘’๐‘๐‘Ž๐‘ก๐‘–๐‘œ๐‘›; 2 ๐‘–๐‘“ ๐‘š๐‘’๐‘’๐‘ก๐‘–๐‘›๐‘” ๐‘’๐‘ฅ๐‘๐‘’๐‘๐‘ก๐‘Ž๐‘ก๐‘–๐‘œ๐‘›; 3 ๐‘–๐‘“ ๐‘’๐‘ฅ๐‘๐‘’๐‘’๐‘‘๐‘–๐‘›๐‘” ๐‘’๐‘ฅ๐‘๐‘’๐‘๐‘Ž๐‘ก๐‘–๐‘œ๐‘› The values of ๐‘ฅ1 have no numerical meaning or ordering; The values of ๐‘ฅ2 have no numerical meaning but have ordering; ๐‘ฅ2 is called ordinal variable. We do not believe the effect on Y when ๐‘ฅ1 changes from 1 to 2 is the same as when x1 changes from 2 to 3. So You cannot use ๐‘ฅ1 and ๐‘ฅ2 directly as regressor since they are not numerical Instead, we need to transform ๐‘ฅ1 and ๐‘ฅ2 into a set of dummy variables, and use those dummy variables as regressors. For instance, for ๐‘ฅ1 we may define two dummy variables ๐ท1 = 1 ๐‘–๐‘“ ๐‘ก๐‘Ž๐‘˜๐‘–๐‘›๐‘” ๐‘๐‘ข๐‘  (๐‘œ๐‘Ÿ ๐‘ฅ1 = 1) ; 0 ๐‘œ๐‘กโ„Ž๐‘’๐‘Ÿ๐‘ค๐‘–๐‘ ๐‘’ ๐ท2 = 1 ๐‘–๐‘“ ๐‘ก๐‘Ž๐‘˜๐‘–๐‘›๐‘” ๐‘ ๐‘ข๐‘๐‘ค๐‘Ž๐‘ฆ (๐‘œ๐‘Ÿ ๐‘ฅ1 = 2); 0 ๐‘œ๐‘กโ„Ž๐‘’๐‘Ÿ๐‘ค๐‘–๐‘ ๐‘’ We do not need to define a dummy for driving (the last group). The intercept term can represent it (the base group). We will fall into dummy variable trap if we define three dummies for three groups, and use them all along with the intercept term. The dummy variable trap is caused by perfect multicollinearity. The stata will automatically drop one of the dummy for you.