Overcoming Omitted Variables: Selection Bias in Treatment Effects, Schemes and Mind Maps of Economics

The econometric problem of selection bias in estimating treatment effects, focusing on omitted variables bias. various methods for estimating treatment effects, including social experiments, regression models, matching estimators, and instrumental variables. It emphasizes the importance of addressing selection bias in empirical studies and introduces the potential outcomes framework for understanding causality and average causal effects.

Typology: Schemes and Mind Maps

2021/2022

Uploaded on 09/27/2022

kianx
kianx 🇬🇧

4

(10)

219 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
treatment effects
The term ‘treatment effect’ refers to the causal effect of a binary (0–1) variable on an
outcome variable of scientific or policy interest. Economics examples include the effects
of government programmes and policies, such as those that subsidize training for
disadvantaged workers, and the effects of individual choices like college attendance. The
principal econometric problem in the estimation of treatment effects is selection bias,
which arises from the fact that treated individuals differ from the non-treated for reasons
other than treatment status per se. Treatment effects can be estimated using social
experiments, regression models, matching estimators, and instrumental variables.
A ‘treatment effect’ is the average causal effect of a binary (0–1) variable on an outcome
variable of scientific or policy interest. The term ‘treatment effect’ originates in a medical
literature concerned with the causal effects of binary, yes-or-no ‘treatments’, such as an
experimental drug or a new surgical procedure. But the term is now used much more generally.
The causal effect of a subsidized training programme is probably the mostly widely analysed
treatment effect in economics (see, for example, Ashenfelter, 1978, for one of the first examples,
or Heckman and Robb, 1985 for an early survey). Given a data-set describing the labour market
circumstances of trainees and a non-trainee comparison group, we can compare the earnings of
those who did participate in the programme and those who did not. Any empirical study of
treatment effects would typically start with such simple comparisons. We might also use
regression methods or matching to control for demographic or background characteristics.
In practice, simple comparisons or even regression-adjusted comparisons may provide
misleading estimates of causal effects. For example, participants in subsidized training
programmes are often observed to earn less than ostensibly comparable controls, even after
adjusting for observed differences (see, for example, Ashenfelter and Card, 1985). This may
reflect some sort of omitted variables bias, that is, a bias arising from unobserved and
uncontrolled differences in earnings potential between the two groups being compared. In
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download Overcoming Omitted Variables: Selection Bias in Treatment Effects and more Schemes and Mind Maps Economics in PDF only on Docsity!

treatment effects

The term ‘treatment effect’ refers to the causal effect of a binary (0–1) variable on an outcome variable of scientific or policy interest. Economics examples include the effects of government programmes and policies, such as those that subsidize training for disadvantaged workers, and the effects of individual choices like college attendance. The principal econometric problem in the estimation of treatment effects is selection bias, which arises from the fact that treated individuals differ from the non-treated for reasons other than treatment status per se. Treatment effects can be estimated using social experiments, regression models, matching estimators, and instrumental variables.

A ‘treatment effect’ is the average causal effect of a binary (0–1) variable on an outcome variable of scientific or policy interest. The term ‘treatment effect’ originates in a medical literature concerned with the causal effects of binary, yes-or-no ‘treatments’, such as an experimental drug or a new surgical procedure. But the term is now used much more generally. The causal effect of a subsidized training programme is probably the mostly widely analysed treatment effect in economics (see, for example, Ashenfelter, 1978, for one of the first examples, or Heckman and Robb, 1985 for an early survey). Given a data-set describing the labour market circumstances of trainees and a non-trainee comparison group, we can compare the earnings of those who did participate in the programme and those who did not. Any empirical study of treatment effects would typically start with such simple comparisons. We might also use regression methods or matching to control for demographic or background characteristics. In practice, simple comparisons or even regression-adjusted comparisons may provide misleading estimates of causal effects. For example, participants in subsidized training programmes are often observed to earn less than ostensibly comparable controls, even after adjusting for observed differences (see, for example, Ashenfelter and Card, 1985). This may reflect some sort of omitted variables bias, that is, a bias arising from unobserved and uncontrolled differences in earnings potential between the two groups being compared. In

general, omitted variables bias (also known as selection bias) is the most serious econometric concern that arises in the estimation of treatment effects. The link between omitted variables bias, causality, and treatment effects can be seen most clearly using the potential-outcomes framework.

Causality and potential outcomes The notion of a causal effect can be made more precise using a conceptual framework that postulates a set of potential outcomes that could be observed in alternative states of the world. Originally introduced by statisticians in the 1920s as a way to discuss treatment effects in randomized experiments, the potential outcomes framework has become the conceptual workhouse for non-experimental as well as experimental studies in many fields (see Holland, 1986, for a survey and Rubin, 1974; 1977, for influential early contributions). Potential outcomes models are essentially the same as the econometric switching regressions model (Quandt, 1958), though the latter is usually tied to a linear regression framework. Heckman (1976; 1979) developed simple two-step estimators for this model.

Average causal effects Except in the realm of science fiction, where parallel universes are sometimes imagined to be observable, it is impossible to measure causal effects at the individual level. Researchers therefore focus on average causal effects. To make the idea of an average causal effect concrete, suppose again that we are interested in the effects of a training programme on the post-training earnings of trainees. Let Y 1 i denote the potential earnings of individual i if he were to receive training and let Y0i denote the potential earnings of individual i if not. Denote training status by a dummy variable, Di. For each individual, we observe Yi = Y 0 i + Di ( Y 1 iY 0 i ), that is, we observe Y 1 i for trainees and Y 0 i for everyone else. Let E [·] denote the mathematical expectation operator, i.e., the population average of a random variable. For continuous random variables, E [ Yi ] = ∫ yf ( y ) dy , where f ( y ) is the density of Yi. By the law of large numbers, sample averages converge to population averages so we can think of E [·] as giving the sample average in very large samples. The two most widely studied

Regression and matching Although it is increasingly common for randomized trials to be used to estimate treatment effects, most economic research still uses observational data. In the absence of an experiment, researchers rely on a variety of statistical control strategies and/or natural experiments to reduce omitted variables bias. The most commonly used statistical techniques in this context are regression, matching, and instrumental variables. Regression estimates of causal effects can be motivated most easily by postulating a

constant-effects model, where Y 1 i − Y 0 i = α (a constant). The constant-effects assumption is not

strictly necessary for regression to estimate an average causal effect, but it simplifies things to postpone a discussion of this point. More importantly, the only source of omitted-variables bias is assumed to come from a vector of observed covariates, Xi, that may be correlated with Di. The key assumption that facilitates causal inference (sometimes called an identifying assumption), is that

E Y [ 0 i | X i , Di ]= X ′ i β (1)

where β is a vector of regression coefficients. This assumption has two parts. First, Y0i (and

hence Y 1 i , given the constant-effects assumption) is mean-independent of Di conditional on Xi. Second, the conditional mean function for Y 0 i given Xi is linear. Given eq. (1), it is straightforward to show that

E Y { i ( Di − E D [ i | X i ])}/ E D { i ( Di − E D [ i | Xi ])} = α. (2)

This is the coefficient on Di from the population regression of Yi on Di and Xi (that is, the regression coefficient in an infinite sample). Again, the law of large numbers ensures that sample regression coefficients estimate this population regression coefficient consistently. Matching is similar to regression in that it is motivated by the assumption that the only source of omitted variables or selection bias is the set of observed covariates, Xi. Unlike regression, however, treatment effects are constructed by matching individuals with the same covariates instead of through a linear model for the effect of covariates. The key identifying assumption is also weaker, in that the effect of covariates on Y0i need not be linear. Instead of (1), the conditional independence assumption becomes

E Y [ (^) ji | X (^) i , Di ] = E Y [ (^) ji | X (^) i ], for j = 0,1. (3) This implies

(^1) (4a) 1

[ | 1] { [ | , 1] [ | 1] | 1}

{ [ | , 1] [ | 0] | 1}

i i i i i i i i

E Y Y D E E Y X D Y D

E E Y X D Y D

i i i i i i i i

X D X D

(^100) 0

, , (^) i =

i^ 0]}

and, likewise, E Y [^1 i− Y 0 (^) i ] = E E Y { [^1 i| X (^) i , Di = 1] − [ Y 0 (^) i | X (^) i , D = (4b)

In other words, we can construct ATET or ATE by averaging X -specific treatment-control contrasts, and then reweighting these X -specific contrasts using the distribution of Xi for the treated (for ATET) or using the marginal distribution of Xi (for ATE). Since these expressions involve observable quantities, it is straightforward to construct consistent estimators from their sample analogs. The conditional independence assumption that motivates the use of regression and matching is most plausible when researchers have extensive knowledge of the process determining treatment status. An example in this spirit is the Angrist (1998) study of the effect of voluntary military service on the civilian earnings of soldiers after discharge, discussed further below.

Regression and matching details In practice, regression estimates can be understood as a type of weighted matching estimator. If, for example, E [ Di | Xi ] is a linear function of Xi (as it might be if the covariates are all discrete), then it is possible to show that eq. (2) is equivalent to a matching estimator that weights cell-by- cell treatment-control contrasts by the conditional variance of treatment in each cell (Angrist, 1998). This equivalence highlights the fact that the most important econometric issue in a study that relies on conditional independence assumptions to identify causal effects is the validity of these conditional independence assumptions, not whether regression or matching is used to implement them. A computational difficulty that sometimes arises in matching models is how to find good matches for each possible value of the covariates when the covariates take on many values. For

non-white veterans earn $2,449 more than non-veterans, controlling for covariates reduces this to $840.

Table 1 Matching and regression estimates of the effects of voluntary military service in the United States

Race

Average earnings in 1988–

Differences in means

Matching estimates

Regression estimates

Regression minus matching (1) (2) (3) (4) (5) Whites 14,537 1,233. (60.3)

    (70.5) - 88. (62.5)

(28.5) Non-whites 11,664 2,449. (47.4)

(62.7)

1,074. (50.7)

(32.5) Notes: Figures are in nominal US dollars. The table shows estimates of the effect of voluntary military service on the 1988–91 Social Security-taxable earnings of men who applied to enter the armed forces during 1979–82. The matching and regression estimates control for applicants’ year of birth, education at the time of application, and Armed Forces Qualification Test (AFQT) score. There are 128,968 whites and 175,262 non-whites in the sample. Standard errors are reported in parentheses.

Source: Adapted from Angrist (1998, Tables II and V).

Table 1 also shows regression estimates of the effect of voluntary service, with the same

covariates used in the matching estimates controlled for. These are estimates of α r in the

equation

Yi = ∑ X^ iX β X + r i i ,

d αD+e

where β X is a regression-effect for Xi = X and αr is the regression parameter. This corresponds to a saturated model for discrete Xi. The regression estimates are larger than (and significantly different from) the matching estimates. But the regression and matching estimates are not very different economically, both pointing to a small earnings loss for White veterans and a modest gain for non-whites.

Instrumental variables estimates of treatment effects The conditional independence assumption required for regression or matching to identify a treatment effect is often implausible. Many of the necessary control variables are typically unmeasured or simply unknown. Instrumental variables (IV) methods solve the problem of

missing or unknown controls, much as a randomized trial also obviates the need for regression or matching. To see how this is possible, begin again with a constant effects model without

covariates, so Y 1 i − Y 0 i = α. Also, let Y 0 i = β + ε i , where β ≡ E [ Y 0 i ]. The potential outcomes model

can now be written

Yi = β+ αD^ i+ εi , (5)

where α is the treatment effect of interest. Because Di is likely to be correlated with ε i ,

regression estimates of eq. (5) do not estimate α consistently. Now suppose that in addition to Yi and Di there is a third variable, Zi , that is correlated with Di , but unrelated to Yi for any other reason. In a constant-effects world, this is equivalent to saying Y 0 i and Zi are independent. It therefore follows that E [ ε^ i| Zi ], (6) a conditional independence restriction on the relation between Zi and Y 0 i , instead of between Di and Y 0 i as required for regression or matching strategies. The variable Zi is said to be an IV or just ‘an instrument’ for the causal effect of Di on Yi. Suppose that Zi is also a 0–1 variable. Taking expectations of (5) with Zi switched off and on, we immediately obtain a simple formula for the treatment effect of interest:

{ E Y [ i | Z i = 1] − E Y [ i | Zi = 0]}/{ E D [ i | Z i = 1] − E D [ i | Zi = 0]}= α. (7)

The sample analog of this equation is sometimes called the Wald estimator, since it first appear in a paper by Wald (1940) on errors-in-variables problems. There are other more complicated IV estimators involving continuous, multi-valued, or multiple instruments. For example, with a multi-valued instrument, we might use the sample analog of Cov( Zi , Yi )/ Cov( Di , Yi ). This simplifies to the Wald estimator when Zi is 0–1. The Wald estimator captures the main idea behind most IV estimation strategies since more complicated estimators can usually be written as a linear combination of Wald estimators (Angrist, 1991).

IV example To see how IV works in practice, it helps to use an example, in this case the effect of Vietnam- era military service on the earnings of veterans later in life (Angrist, 1990). In the 1960s and

later. In contrast, there is no evidence of an association between eligibility status and earnings in 1969, the year the lottery drawing for men born in 1950 was held but before anyone born in 1950 was actually drafted. Because eligibility status was randomly assigned, the claim that the estimates in column (3) represent the effect of draft eligibility on earnings seems uncontroversial. The only information required to go from draft-eligibility effects to veteran-status effects is the denominator of the Wald estimator, which is the effect of draft-eligibility on the probability of serving in the military. This information is reported in column (4) of Table 2, which shows that draft-eligible men were 0.16 more likely to have served in the Vietnam era. For earnings in 1981, long after most Vietnam-era servicemen were discharged from the military, the Wald estimates of the effect of military service amount to about 15 percent of earnings. Effects were even larger in 1970, when affected soldiers were still in the army.

IV with heterogeneous treatment effects The constant-effects assumption is clearly unrealistic. We’d like to allow for the fact that some men may have benefited from military service while others were undoubtedly hurt by it. In general, however, IV methods fail to capture either ATE or ATET in a model with heterogeneous treatment effects. Intuitively, this is because only a subset of the population is affected by any particular instrumental variable. In the draft lottery example, many men with high lottery numbers volunteered for service anyway (indeed, most Vietnam veterans were volunteers), while many draft-eligible men nevertheless avoided service. The draft lottery instrument is not informative about the effects of military service on men who were unaffected by their draft-eligibility status. On the other hand, there is a sub-population who served solely because they were draft-eligible, but would not have served otherwise. Angrist, Imbens and Rubin (1996) call the population of men whose treatment status can be manipulated by an instrumental variable the set of compliers. This term comes from an analogy to a medical trial with imperfect compliance. The set of compliers are those who ‘take their medicine’, that is, they serve in the military when draft-eligible but they do not serve otherwise. Under reasonably general assumptions, IV methods can be relied on to capture the effect of

i ).

treatment on compliers. The average effect for this group is called a local average treatment effect (LATE), and was first discussed by Imbens and Angrist (1994). A formal description of LATE requires one more bit of notation. Define potential treatment assignments D 0 i and D 1 i to be individual i ’s treatment status when Zi equals 0 or 1. One of D 0 i or D 1 i is counterfactual since observed treatment status is Di = D^0 i+ Zi ( D^1 i− D 0 The key identifying assumptions in this setup are ( a ) conditional independence, that is, that the joint distribution of { Y 1 i , Y 0 i , D 1 i , D 0 i } is independent of Zi ; and ( b ) monotonicity, which requires that either D 1 iD 0 i for all i or vice versa. Monotonicity requires that, while the instrument might have no effect on some individuals, all of those who are affected should be affected in the same way (for example, draft eligibility can only make military service more likely, not less). Assume without loss of generality that monotonicity holds with D 1 iD 0 i. Given these two assumptions, the Wald estimator consistently estimates LATE, written formally as E [ Y 1 iY 0 i | D 1 i > D 0 i ]. In the draft lottery example, this is the effect of military service on those veterans who served because they were draft eligible but would not have served otherwise. In general, LATE compliers are a subset of the treated. An important special case where LATE = ATET is when D 0 i equals zero for everyone. This happens in a social experiment with imperfect compliance in the treated group and no one treated in the control group.

IV Details Typically, covariates play a role in IV models, either because the IV identification assumptions are more plausible conditional on covariates or because of statistical efficiency gains. Linear IV models with covariates can be estimated most easily by two-stage least squares (2SLS), which can also be used to estimate models with multi-valued, continuous, or multiple instruments. See Angrist and Imbens (1995) or Angrist and Krueger (2001) for details and additional references.

Joshua D. Angrist

See also instrumental variables with weak instruments; matching estimators; regression-

Heckman, James J. and Robb, R. 1985. Alternative methods for evaluating the impact of interventions. In J. Heckman & B. Singer (Eds.), Longitudinal Analysis of Labor Market Data (pp. 156-245). New York: Cambridge University Press. Holland, P. 1986. Statistics and causal inference. Journal of the American Statistical Association 81, 945–70. Imbens, G. and Angrist, J. 1994. Identification and estimation of local average treatment effects. Econometrica 62, 467–75. Quandt, R. 1958. The estimation of the parameters of a linear regression system obeying two separate regimes. Journal of the American Statistical Association 53, 873–80. Rosenbaum, P. and Rubin, D. 1983. The central role of the propensity score in observational studies for causal effects. Biometrika 70, 41–55. Rubin, D. 1974. Estimating causal effects of treatments in randomized and non-randomized studies. Journal of Educational Psychology 66, 688–701. Rubin, D. 1977. Assignment to a treatment group on the basis of a covariate. Journal of Educational Statistics 2, 1–26. Wald, A. 1940. The fitting of straight lines if both variables are subject to error. Annals of Mathematical Statistics 11, 284–300.