Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Introduction to Analyzing Qualitative Factors with Dummy Variables, Exams of Economics

Boston College (BC)Economics

An introduction to multiple regression analysis with dummy variables, also known as qualitative or categorical variables. How to generate and use dummy variables to evaluate the use of qualitative information in standard regression models with continuous dependent variables. It also covers how to combine quantitative and qualitative information, test for statistical discrimination, and consider interactions involving dummy variables.

Typology: Exams

Pre 2010

Uploaded on 08/30/2009

koofers-user-jzt 🇺🇸

4.5

(2)

10 documents

1 / 12

This page cannot be seen from the preview

Don't miss anything!

state

tab state, gen(st)

st1, st2, ... st6

Wooldridge, Introductory Econometrics, 2d ed.

Chapter 7: Multiple regression analysis with binary

(dummy) variables

binary variables

dummy variables

We often consider relationships between observed outcomes

and qualitative factors: models in which a continuous dependent

variable is related to a number of explanatory factors, some of

which are quantitative, and some of which are qualitative. In

econometrics, we also consider models of qualitative dependent

variables, but we will not explore those models in this course

due to time constraints. But we can readily evaluate the use

of qualitative information in standard regression models with

continuous dependent variables.

Qualitative information often arises in terms of some coding,

or index, which takes on a number of values: for instance, we

may know in which one of the six New England states each of

the individuals in our sample resides. The data themselves may

be coded with the biliteral “MA”, “RI”, “ME”, etc. How can we

use this factor in a regression equation? In the data, takes

on six distinct values. We must create six , or

, each of which will refer to one state–that is,

that variable will be 1 if the individual comes from that state,

and 0 otherwise. We can generate this set of 6 variables easily in

Stata with the command , which will

create 6 new variables in our dataset: .

Each of these variables are dummies–that is, they only contain 0

or 1 values. If we add up these variables, we get–exactly–a vector

of 1’s, suggesting that we will never want to use all 6 variables

1

Discover Exams of Economics Boston College (BC)

Partial preview of the text

Download Introduction to Analyzing Qualitative Factors with Dummy Variables and more Exams Economics in PDF only on Docsity!

state

tab state, gen(st) st1, st2, ... st

Wooldridge, Introductory Econometrics, 2d ed. Chapter 7: Multiple regression analysis with binary (dummy) variables

binary variables dummy variables

We often consider relationships between observed outcomes and qualitative factors: models in which a continuous dependent variable is related to a number of explanatory factors, some of which are quantitative, and some of which are qualitative. In econometrics, we also consider models of qualitative dependent variables, but we will not explore those models in this course due to time constraints. But we can readily evaluate the use of qualitative information in standard regression models with continuous dependent variables. Qualitative information often arises in terms of some coding, or index, which takes on a number of values: for instance, we may know in which one of the six New England states each of the individuals in our sample resides. The data themselves may be coded with the biliteral “MA”, “RI”, “ME”, etc. How can we use this factor in a regression equation? In the data, takes on six distinct values. We must create six , or , each of which will refer to one state–that is, that variable will be 1 if the individual comes from that state, and 0 otherwise. We can generate this set of 6 variables easily in Stata with the command , which will create 6 new variables in our dataset:. Each of these variables are dummies–that is, they only contain 0 or 1 values. If we add up these variables, we get–exactly–a vector of 1’s, suggesting that we will never want to use all 6 variables

th

any five

summ st* st

income st

0 1 1 2 2 3 3 4 4 5 5

0

1 2

(^0 1 2 3 4 5 )

income st st st st st u

H

p

in a regression (since by knowing the values of any 5...) We may also find the proportions of each state’s citizens in our sample very easily: will give the descriptive statistics of all 6 variables, and the mean of each dummy is the sample proportion living in that state. How can we use these dummy variables? Say that we wanted to know whether incomes differed significantly across the 6-state region. What if we regressed on of these dummies?

where I have suppressed the observation subscripts. What are the regression coefficients in this case? is the average income in the 6 state–the dummy for which is excluded from the regression. is the difference between the income in state 1 and the income in state 6. is the difference between the income in state 2 and the income in state 6, and so on. What is the ordinary “ANOVA F” in this context–the test that all the slopes are equal to zero? Precisely the test of the null hypothesis:

(2)

versus the alternative that not all six of the state means are the same value. It turns out that we can test this same hypothesis by excluding any one of the dummies, and including the remaining five in the regression. The coefficients will differ, but the value of the ANOVA F will be identical for any of these regressions. In fact, this regression is an example of “classical one-way ANOVA”–testing whether a qualitative factor (in this

Regression with continuous and dummy variables

nen st1...st

st6, st4 st3 gen nen = st3+st4+st

nen nes

regress inc nen

nen

regress inc nen nes, noc

evaluates that expression and returns true (1) or false (0). The vertical bar ( ) is Stata’s OR operator; since every person in the sample lives in one and only one state, we must use OR to phrase the condition that they live in northern New England. But there is another way to generate this dummy, given that we have defined for the regression above. Let’s say that Vermont, New Hampshire and Maine have been coded as and , respectively. We may just , since the sum of mutually exclusive and exhaustive dummies must be another dummy. To check, the resulting will have a mean equal to the percentage of the sample that live in northern New England; the equivalent dummy will have a mean for southern New England residents; and the sum of those two means will of course be 1. We can then run a simplified form of our model as ; the ANOVA F statistic for that regression tests the null hypothesis that incomes in northern and southern New England do not differ significantly. Since we have excluded nes, the “slope” coefficient on measures the amount by which northern New England income differs from southern New England income; the mean income for southern New England is the constant term. If we want point and interval estimates for those means, we should .

In the above examples, we have estimated “pure ANOVA” models–regression models in which all of the explanatory variables are dummies. In econometric research, we often want

0 1 2

0 1 0 1 2 2

0 2 2

2

1 2

gen female = (gender==”F”)

gen black = (race==“black”)

wage educ f emale u

b b educ, b b educ b.

b.

H <. t b

b

educ, wage

b b.

to combine quantitative and qualitative information, including some regressors that are measurable and others that are dummies. Consder the simplest example: we have data on individuals’ wages, years of education, and their gender. We could create two gender dummies, male and female, but we will only need one in the analysis: say, female. We create this variable as

. We can then estimate the model: (3) The constant term in this model now becomes the wage for a male with zero years of education. Male wages are predicted as while female wages are predicted as The gender differential is thus How would we test for the existence of “statistical discrimination”–that, say, females with the same qualifications are paid a lower wage? This would be The statistic for will provide us with this hypothesis test. What is this model saying about wage structure? Wages are a linear function of the years of education. If is significantly different than zero, then there are two “wage profiles”–parallel lines in space, each with a slope of , with their intercepts differing by What if we wanted to expand this model to consider the possibility that wages differ by both gender and race? Say that each worker is classified as race=white or race=black. Then we could to create the dummy variable, and add it to (3). What, now, is the constant term? The wage for a white male with zero years of education. Is

j

1

2

st

log

, j , ...,.

wage

b

corprt

AAA, AA, A, BAA,

BA, B, C,

where we include any 5 of the 6 variables designating the New England states. The test that wage levels differ significantly due to state of residence is the joint test that A judgment concerning the relevance of state of residence should be made on the basis of this joint test (an F-test with 5 numerator degrees of freedom). Note that if the dependent variable was measured in log form, the coefficients on dummies would be interpreted as percentage changes; if (5) was respecified to place as the dependent variable, the coefficient would measure the percentage return to education (how many percent does the wage change for each additional year of education), while the coefficient would measure the (approximate) percentage difference in wage levels between females and males, ceteris paribus. The state dummies would, likewise, measure the percentage difference in wage levels between that state and the excluded state (number 6). We must be careful when working with variables that have an ordinal interpretation, and are thus coded in numeric form, to treat them as ordinal. For instance, if we model the interest rate corporations must pay to borrow ( as a function of their credit rating, we consider that Moody’s and Standard and Poor’s assign credit ratings somewhat like grades: et cetera. Those could be coded as 1,2,...,7. Just as we can agree that an “A” grade is better than a “B”, a triple-A bond rating results in a lower borrowing cost than a double-A rating. But while GPAs are measured on a clear four-point scale, the bond ratings are merely ordinal, or ordered: everyone

0 1 2

Interactions involving dummy variables

AA

A

B C

C

AAA...C,

∂corprt/∂index index

corprt

AAA

C

f emale

married. lwage

lwage b b f emale b married u

agrees on the rating scale, but the differential between borrowers’ rates and borrowers’ rates might be much smaller than that between and borrowers’ rates: especially the case if denotes “below investment grade”, which will reduce the market for such bonds. Thus, although we might have a numeric index corresponding to we should not assume that is constant; we should not treat as a cardinal measure. Clearly, the appropriate way to proceed is to create dummy variables for each rating class, and include all but one of those variables in a regression of on bond rating and other relevant factors. For instance, if we leave out the dummy, all of the ratings class dummies’ coefficients will then measure the degree to which those borrowers’ bonds bear higher rates than those of borrowers. But we could just as well leave out the rating class dummy, and measure the effects of ratings classes relative to the worst credits’ cost of borrowing.

Just as continuous variables may be interacted in regres- sion equations, so can dummy variables. We might, for instance, have one set of dummies indicating the gender of respon- dents ( and another set indicating their marital status ( We could regress on these two dummies:

which gives rise to the following classification of mean wages, conditional on the two factors (which is thus a classic “two-way ANOVA” setup):

0 2 0 1 2 3

race

white, Black, Asian.

Black Asian

f Black f Asian,

lwage b b f emale b educ b f emale educ u

b

b , b b

b b.

with two-way ANOVA (considering two factors’ effects), imagine that instead of marital status we consider To run the model without interactions, we would include two of these dummies in the regression–say, and ; the constant term would be the mean wage of a white male (the excluded class). What if we wanted to include interactions? Then we would define and and include those two regressors as well. The test for the significance of interactions is now a joint test that these two coefficients are jointly zero. A second extension of the interaction concept is far more important: what if we want to consider a regular regression, on quantitative variables, but want to allow for different slopes for different categories of observations? Then we create interaction effects between the dummies that define those categories and the measured variables. For instance,

Here, we are in essence estimating two separate regressions in one: a regression for males, with an intercept of and a slope of and a regression for females, with an intercept of and a slope of Why would we want to do this? We could clearly estimate the two separate regressions, but if we did that, we could not conduct any tests (e.g. do males and females have the same intercept? The same slope?). If we use interacted dummies, we can run one regression, and test all of the special cases of this model which are nested within: that the slopes are the same, that the intercepts are the same, and the “pooled” case

race,

Black Asian educ.

in which we need not distinguish between males and females. Since each of these special cases merely involves restrictions on this general form, we can run this equation and then just conduct the appropriate tests. If we extended this logic to include as defined above, as an additional factor, we would include two of the race dummies (say, and and interact each with This would be a model without interactions–where the effects of gender and race are considered to be independent–but it would allow us to estimate different regression lines for each combination of gender and race, and test for the importance of each factor. These interaction methods are often used to test hypotheses about the importance of a qualitative factor–for instance, in a sample of companies from which we are estimating their profitability, we may want to distinguish between companies in different industries, or companies that underwent a significant merger, or companies that were formed within the last decade, and evaluate whether their expenditures on R&D or advertising have the same effects across those categories. All of the necessary tests involving dummy variables and interacted dummy variables may be easily specified and computed, since models without interacted dummies (or without certain dummies in any form) are merely restricted forms of more general models in which they appear. Thus, the standard “subset F” testing strategy that we have discussed for the testing of joint hypotheses on the coefficient vector may be readily applied in this context. The text describes how a “Chow test”

Introduction to Analyzing Qualitative Factors with Dummy Variables, Exams of Economics

Related documents

Partial preview of the text

Download Introduction to Analyzing Qualitative Factors with Dummy Variables and more Exams Economics in PDF only on Docsity!

income st st st st st u

H

p

wage educ f emale u

b b educ, b b educ b.

b.

H <. t b

b

educ, wage

b b.

log

, j , ...,.

wage

b

b

corprt

AAA, AA, A, BAA,

BA, B, C,

AA

A

B C

C

AAA...C,

∂corprt/∂index index

corprt

AAA

AAA

C

f emale

married. lwage

lwage b b f emale b married u

race

white, Black, Asian.

Black Asian

f Black f Asian,

lwage b b f emale b educ b f emale educ u

b

b , b b

b b.

race,

Black Asian educ.