

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The concept of dummy variables in regression analysis, their interpretation, and how to use them as regressors or the dependent variable. It also covers the chow test and its significance in testing the null hypothesis of no difference between groups. The document also discusses the difference between numerical and ordinal variables and how to transform them into dummy variables for regression analysis.
Typology: Study notes
1 / 2
This page cannot be seen from the preview
Don't miss anything!


Chapter 7, Dummy Variable Dummy variable can only take values 1 and 0. It is categorical, that means the numbers 1 and 0 have no numerical meanings (we cannot say 1 is greater than 0). In this chapter we use dummy as regressor. Chapter 17 (covered in eco411) shows how to use dummy as the dependent variable. First letโs use wage data and consider a simple regression ๐ค๐๐๐ = ๐ฝ 0 + ๐ฝ 1 ๐ท + ๐ข (1) where ๐ท = 0 for male, and ๐ท = 1 for female. For dummy variable, you have to be clear ๐ท = 0 is for which group (called base group). Later all comparisons are made relative to the base group. You can report the frequency of ๐ท using tab D. The key to understand the dummy-variable-model is to discuss: when = 0, ๐ค๐๐๐ = _____________________. If we take expectation we get ___________________ when = 1, ๐ค๐๐๐ = _____________________. If we take expectation we get ___________________ So ๐ฝ 0 can be interpreted as ________________________; and ๐ฝ 1 can be interpreted as____________________ This result suggests that we can conduct the two-sample t test (the comparison of means test, stata command: ttest wage, by(D)) using the simple regression (1) that involves dummy. Now consider a multiple regression ๐ค๐๐๐ = ๐ฝ 0 + ๐ฝ 1 ๐ท + ๐ฝ 2 ๐ฅ + ๐ฝ 3 (๐ท โ ๐ฅ) + ๐ข (2) For example x can be exper, and ๐ท โ ๐ฅ is the interaction term (product of) x and dummy. Letโs discuss again: when ๐ท = 0, _________________________________________________________________________________________ when ๐ท = 1, _________________________________________________________________________________________ ๐ฝ 0 can be interpreted as _________________________________________________________; ๐ฝ 1 can be interpreted as _________________________________________________________; ๐ฝ 2 can be interpreted as _________________________________________________________; ๐ฝ 3 can be interpreted as _________________________________________________________; How to show ๐ฝ 1 and ๐ฝ 3 in graph?
In this context there is a very important F test, called Chow Test, which is concerned with a particular null hypothesis ๐ป0: ๐ฝ 1 = 0, ๐ฝ 3 = 0 (3) The meaning of this null hypothesis is _____________________________________ The restricted regression is ______________________________________________ The Chow test is ______________________________________________________ What should we do if the null hypothesis is not rejected? What should we do if the null hypothesis is rejected? When the x variable itself is a dummy, then the regression becomes very interesting. In this case, the ๐ฝ 3 in the below regression is called difference-in-difference estimator (the coefficient of interaction term of two dummy). ๐ = ๐ฝ 0 + ๐ฝ 1 ๐ท 1 + ๐ฝ 2 ๐ท 2 + ๐ฝ 3 (๐ท 1 โ ๐ท 2 ) + ๐ข (4) Exercise: How to interpret ๐ฝ 3? To fix idea, let y be wage, ๐ท 1 be female dummy, and ๐ท 2 be married dummy. Letโs discuss:
Some variables look like numerical, but they are not. Two examples: ๐ฅ1 = 1 ๐๐ ๐ก๐๐๐๐๐ ๐๐ข๐ ; 2 ๐๐ ๐ก๐๐๐๐๐ ๐ ๐ข๐๐ค๐๐ฆ; 3 ๐๐ ๐๐๐๐ฃ๐๐๐ ๐ฅ2 = 1 ๐๐ ๐๐๐๐๐๐๐ ๐๐ฅ๐๐๐๐๐ก๐๐๐; 2 ๐๐ ๐๐๐๐ก๐๐๐ ๐๐ฅ๐๐๐๐ก๐๐ก๐๐๐; 3 ๐๐ ๐๐ฅ๐๐๐๐๐๐๐ ๐๐ฅ๐๐๐๐๐ก๐๐๐ The values of ๐ฅ1 have no numerical meaning or ordering; The values of ๐ฅ2 have no numerical meaning but have ordering; ๐ฅ2 is called ordinal variable. We do not believe the effect on Y when ๐ฅ1 changes from 1 to 2 is the same as when x1 changes from 2 to 3. So You cannot use ๐ฅ1 and ๐ฅ2 directly as regressor since they are not numerical Instead, we need to transform ๐ฅ1 and ๐ฅ2 into a set of dummy variables, and use those dummy variables as regressors. For instance, for ๐ฅ1 we may define two dummy variables ๐ท1 = 1 ๐๐ ๐ก๐๐๐๐๐ ๐๐ข๐ (๐๐ ๐ฅ1 = 1) ; 0 ๐๐กโ๐๐๐ค๐๐ ๐ ๐ท2 = 1 ๐๐ ๐ก๐๐๐๐๐ ๐ ๐ข๐๐ค๐๐ฆ (๐๐ ๐ฅ1 = 2); 0 ๐๐กโ๐๐๐ค๐๐ ๐ We do not need to define a dummy for driving (the last group). The intercept term can represent it (the base group). We will fall into dummy variable trap if we define three dummies for three groups, and use them all along with the intercept term. The dummy variable trap is caused by perfect multicollinearity. The stata will automatically drop one of the dummy for you.