Quantile Treatment Effect Estimation and Identification in Econometrics, Schemes and Mind Maps of Literature

The quantile treatment effect (QTE) and its estimation in econometrics, including the identification of QTE under various assumptions, the use of instrumental variables, and the derivation of the marginal policy relevant quantile treatment effect (MPRQTE). It also covers the moment conditions for identification and the unconfoundedness case.

Typology: Schemes and Mind Maps

2021/2022

Uploaded on 09/27/2022

manager33
manager33 🇬🇧

4.4

(34)

241 documents

1 / 76

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Marginal Quantile Treatment E¤ect
Ping Yu
University of Auckland
Started: April 2013
First Version: October 2013
This Version: April 2014
Abstract
This paper studies estimation and inference based on the ma rginal quantile treatment e¤ect. First, we
illustrate the importance of the rank preservation assumption in the quantile treatment e¤ects evaluation,
show the identi…ability of the marginal quantile treatment e¤ect, and clarify the relationship between
the marginal quantile treatment e¤ect and other quantile treatment pa rameters. Second, we develop
sharp bounds for the quantile treatm ent e¤ect with and without the monotonicity assumption, and also
su¢ cient and necessary conditions for point identi…cation. Third, we estimate the marginal quantile
treatment e¤ect and associated quantile treatment e¤ect and integrated quantile treatment e¤ect based
on the distribution regression, derive the corresp onding weak limits and show the validity of the bo otstrap
inferences. The inference procedure can be used to construct uniform con…dence bands for quantile
treatment parameters and test unconfoundedness and stochastic dominance. We also develop goodness
of …t tests to choose regressors in the distribution regression. Fourth, we conduct two counterfactual
analyses: deriving the transition matrix and developing the relative marginal p olicy relevant quantile
treatment e¤ect parameter under the policy invariance. Fifth, we compare the identi…cation schemes in
some imp ortant literature with that by the marginal quantile treatment e¤ect, and point out advantages
and also weaknesses of each scheme, e.g., Chernozhukov and Hansen (2005) concentrate mainly on the
quantile treatment e¤ect with the selection select but without the essential heterogeneity; Abadie, Angrist
and Imbens (2002), Aakv ik, Heckman and Vytlacil (2005) and Chernozhukov and Hansen (2006) su¤er
from some obvious missp eci…cation problems. Meanwhile, an alternative estimator of the local quantile
treatment e¤ect is develop ed and its weak limit is derived . Finally, we apply the estimation methods to
the famous return to schoo ling dataset of Angrist and Krueger (1991) to illustrate the usefulness of the
techniques developed in th is paper to practitioners.
Keywords: marginal quantile treatment e¤ect, local quantile treatment e¤ect, ran k preservation, se-
lection e¤ect, essential heterogeneity, sharp bound, p oint identi…cation, distribution regression, two-step
estimator, Hadamard di¤erentiability, weak limit, uniform con…dence band, unconfoundedn ess, com-
pleteness, stochastic dominance, goodness of …t test, transition matrix, relative marginal po licy relevant
quantile treatment e¤ect, counterfactu al analysis, policy invariance, bootstrap validity, return to school-
ing
JEL-Classification: C12, C13, C14, C21, C26
Ema il: p.yu @au cklan d.ac. nz.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c

Partial preview of the text

Download Quantile Treatment Effect Estimation and Identification in Econometrics and more Schemes and Mind Maps Literature in PDF only on Docsity!

Marginal Quantile Treatment E§ect

Ping Yu

University of Auckland

Started: April 2013

First Version: October 2013

This Version: April 2014

Abstract This paper studies estimation and inference based on the marginal quantile treatment e§ect. First, we illustrate the importance of the rank preservation assumption in the quantile treatment e§ects evaluation, show the identiÖability of the marginal quantile treatment e§ect, and clarify the relationship between the marginal quantile treatment e§ect and other quantile treatment parameters. Second, we develop sharp bounds for the quantile treatment e§ect with and without the monotonicity assumption, and also su¢ cient and necessary conditions for point identiÖcation. Third, we estimate the marginal quantile treatment e§ect and associated quantile treatment e§ect and integrated quantile treatment e§ect based on the distribution regression, derive the corresponding weak limits and show the validity of the bootstrap inferences. The inference procedure can be used to construct uniform conÖdence bands for quantile treatment parameters and test unconfoundedness and stochastic dominance. We also develop goodness of Öt tests to choose regressors in the distribution regression. Fourth, we conduct two counterfactual analyses: deriving the transition matrix and developing the relative marginal policy relevant quantile treatment e§ect parameter under the policy invariance. Fifth, we compare the identiÖcation schemes in some important literature with that by the marginal quantile treatment e§ect, and point out advantages and also weaknesses of each scheme, e.g., Chernozhukov and Hansen (2005) concentrate mainly on the quantile treatment e§ect with the selection select but without the essential heterogeneity; Abadie, Angrist and Imbens (2002), Aakvik, Heckman and Vytlacil (2005) and Chernozhukov and Hansen (2006) su§er from some obvious misspeciÖcation problems. Meanwhile, an alternative estimator of the local quantile treatment e§ect is developed and its weak limit is derived. Finally, we apply the estimation methods to the famous return to schooling dataset of Angrist and Krueger (1991) to illustrate the usefulness of the techniques developed in this paper to practitioners.

Keywords: marginal quantile treatment e§ect, local quantile treatment e§ect, rank preservation, se- lection e§ect, essential heterogeneity, sharp bound, point identiÖcation, distribution regression, two-step estimator, Hadamard di§erentiability, weak limit, uniform conÖdence band, unconfoundedness, com- pleteness, stochastic dominance, goodness of Öt test, transition matrix, relative marginal policy relevant quantile treatment e§ect, counterfactual analysis, policy invariance, bootstrap validity, return to school- ing JEL-Classification: C12, C13, C14, C21, C

Email: [email protected].

1 Introduction

Treatment e§ect evaluation is one main task of econometric analysis. Most literature concentrates on the average treatment e§ect evaluation; see Heckman and Vytlacil (2007a,b) for a comprehensive summary. Meanwhile, as illustrated in Heckman (1992), Heckman et al. (1997) and Heckman and Smith (1993, 1998), questions of political economy or "social justice" requires knowledge of the distribution of the treatment e§ect. As a result, distributional treatment e§ects (especially when unconfoundedness does not hold) become natural parameters of interest among econometricians. Actually, distributional treatment e§ects have been studied extensively in the empirical literature. For example, Card (1996) uses a panel data set to study the e§ects of unions on the structure of wages; DiNardo et al. (1996) presents a semiparametric procedure to analyze the e§ects of institutional and labor market factors on changes in the U.S. distribution of wages; Bitler et al. (2006) estimate quantile treatment e§ects using random-assignment data from Connecticutís Job First waiver. Distributional treatment e§ects are usually estimated based on quantile regression initiated by Koenker and Bassett (1978) (see Koenker (2005) for an introduction to quantile regression). One related Öeld that recently attracts much attention is the "general" semiparametric and nonparametric quantile regression with endogeneity. For the semiparametric setups, see, e.g, Hong and Tamer (2003), HonorÈ and Hu (2004), Ma and Koenker (2006), Lee (2007), Sakata (2007) and Jun (2008) among others. For nonparametric setups, see, e.g., Chesher (2003), Chernozhukov et al. (2007), Horowitz and Lee (2007), Imbens and Newey (2009), Chen and Pouzo (2012), and Gagliardini and Scaillet (2012) among others. However, the main interest of this paper concentrates on the special structure of the treatment model, namely, the endogenous variable is binary. A key parameter we will develop is the marginal quantile treatment e§ ect (MQTE), which is the counterpart of the marginal treatment e§ect (MTE) in the average treatment e§ect estimation. The idea of the MTE was Örst introduced in the context of a parametric normal generalized Roy model by Bjˆrklund and Mo¢ tt (1987), and was analyzed more generally by Heckman (1997). In a choice (or selection, or participation) model with the latent variable structure, Heckman and Vytlacil (1999, 2001a) express the conventional average treatment e§ect parameters as di§erent weighted averages of the MTE, and also identify the MTE by the local instrumental variable (LIV) estimator. Actually, Heckman and Vytlacil (2007b) use the MTE to unify the econometric literature on the evaluation of social programs, so it is well recognized that the MTE is a convenient tool to organize the nonparametric literature on the average treatment e§ect evaluation. An embarrassing situation is that the counterpart of the MTE in the quantile treatment e§ect literature, the MQTE, is yet to be well understood. The purpose of this paper is to integrate the relevant literature on the quantile treatment e§ect evaluation without unconfoundedness into one framework and provide some useful estimation and inference methods to practitioners based on the MQTE. There are two strands of literature concerning about the distributional treatment e§ects, and they are interwined. Before reviewing the relevant literature, we must emphasize that the distributional treatment e§ects are functionals of the distribution of Y 1 Y 0 , which requires the joint distribution of Y 1 and Y 0 , where Y 1 and Y 0 are the outcome under the treatment status and the control status, respectively. As mentioned in Section II.B of Manski (1996) or footnote 5 of Manski (1997), "knowledge of F (Y 1 Y 0 ) neither implying nor being implied by knowledge of F (Y 1 ) and F (Y 0 )", where F (X) is the cumulative distribution function (CDF) of X for a random variable X. Due to the fundamental problem of causal inference (page 947 of Holland (1986)), Y 0 and Y 1 cannot be observed simultaneously. As a result, even in a random experiment, the joint distribution F (Y 1 ; Y 0 ) or F (Y 1 Y 0 ) cannot be identiÖed if without further restrictions although F (Y 1 ) and F (Y 0 ) can be identiÖed. On the other hand, marginal distributions F (Y 1 ) and F (Y 0 ) are also of interest in econometric analysis. For example, in Atkinson (1970), Sen (1997, 2000), Manski (1996, p714),

of literature to identify the (conditional) marginal distributions of potential outcomes. These distributions imply the MQTE, which is also the main objective of this paper but we do not need the independence assumption. The above-mentioned literature concentrates on the cross-sectional data; Athey and Imbens (2006) also use the panel data to identify the QTT through what they called change-in-change approach under the RP condition on the treated. Although these two strands of literature use di§erent identiÖcation assumptions, their targets are the same, namely, identifying the joint distribution of Y 1 and Y 0. This paper can be put in the second strand of literature, i.e., we impose some RP assumptions to identify F (Y 1 ; Y 0 ). Consequently, the quantile treatment e§ect in this paper refers to the di§erence of quantiles rather than the quantile of di§erences. Meanwhile, we employ the framework in the Örst strand of literature to study the di§erence of quantiles. The rest of this paper is structured as follows. Section 2 sets up our treatment model, illustrates the importance of the RP assumption in the quantile treatment e§ect evaluation, shows the identiÖability of the MQTE, and clariÖes the relationship between the MQTE and other quantile treatment parameters. Section 3 develops sharp bounds and su¢ cient and necessary conditions for point identiÖcation of the QTE with and without the monotonicity assumption. In Section 4, we estimate the MQTE based on the distribution regression introduced by Foresi and Peracchi (1995), derive its weak limit and show the validity of the bootstrap inferences, and we also develop goodness of Öt tests to choose regressors. In Section 5, we conduct two counterfactual analyses: deriving the transition matrix and developing the relative marginal policy relevant quantile treatment e§ect parameter under the policy invariance. In Section 6, we comment some key literature in the two strands above, pointing out their weaknesses, underlying assumptions, and interactions with this paper. Section 7 presents an empirical application to the return to schooling and Section 8 concludes. All proofs are contained in an appendix. Some notations are collected here for future reference. d is always used for indicating the two treatment statuses, so is not written out explicitly as "d = 0; 1 " throughout the paper. supp(X) for a random variable X denotes the support of the distribution of X. Both QX ( ) and Q (X) denote the  th quantile of a random variable X. The capital letters such as X denote random variables and the corresponding lower case letter such as x denote the potential values they may take. For any parameter , d is the dimension of . The space ^1 (F) represents the space of real-valued bounded functions deÖned on the index set equipped with the supremum norm kk (^1) (F). C (Y) is the space of continuous functions on Y.

2 The Setup and Parameters of Interest

We use the nonlinear and nonseparable outcome model as in Heckman and Vytlacil (2005),

Y 1 =  1 (X; U 1 ); Y 0 =  1 (X; U 0 ):

Actually, the additively separable setup, Yd = d(X) + Ud, does not lose generality since we can deÖne the new Ud as Yd QYdjX ( jX) and all our analysis in this paper is conditional on X. The distribution of Yd may be discrete (e.g., employment status), continuous (e.g., wage), or mixed discrete and continuous (e.g., in the national JTPA study 18 month impact sample used in Heckman et al. (1997), a substantial proportion of persons has zero earnings in both distributions of Y 0 and Y 1 ). The participation decision

D = 1(D (X; Z) V  0); (2)

where Z includes the instruments for the choice process. Both X and Z appearing as the arguments of D does not lose generality since D (X; Z) may not depend on all elements of X. By transforming D (X; Z) and V by FV jX;Z , we can rewrite D = 1(p(X; Z) UD  0); (3)

where UD jX; Z  U (0; 1) and p(X; Z) is the propensity score. We use these two formulations of D inter- changeably throughout the paper. As shown in Vytlacil (2006), there is a larger class of latent index models that will have a representation of this form. Also, this setup of D implies the monotonicity assumption of Imbens and Angrist (1994) as shown in Vytlacil (2002). We impose the following assumptions on the outcome equation and the choice equation. (A1) D (X; Z) is a nondegenerate random variable conditional on X. (A2) The random vectors (U 1 ; V ) and (U 0 ; V ) are independent of Z conditional on X. (A3) The distribution of V is absolutely continuous with respect to Lebesgue measure. (A4) X 1 = X 0 almost everywhere, where Xd denote a value of X if D is set to d. (A5) 1 > P (D = 1jX) > 0. (A6) Conditional on X = x, V = v, Y 0 and Y 1 have the same rank:

(A1)-(A5) corresponds to (A-1)-(A-3), (A-6) and (A-5) in Heckman and Vytlacil (2005), respectively. These assumptions are prevalent in the literature with heterogeneous treatment e§ects. A necessary condition for (A1) is that Z contains a continuous variable. (A2) allows for both the selection e§ect (U 0 6? DjX) and the essential heterogeneity ((U 1 U 0 ) 6? DjX). Also, (A2) implies the usual assumption in the control function approach, say, Z? (U 1 ; U 0 )j (X; V ). (A1)-(A5), combined with (1) and (2), impose testable restrictions on the distribution of (Y; D; Z; X); see Heckman and Vytlacil (2005) (page 678) for the index su¢ ciency restriction and the monotonicity restriction. We refer to Heckman and Vytlacil (2005) for more detailed discussions on (A1)-(A5). The assumption (A6) deserves further examination.

2.1 The Rank Preservation Condition

The key extra assumption beyond those in Heckman and Vytlacil (2005) is the RP condition (A6). Cher- nozhukov and Hansen (2005) state the RP assumption via the Skorohod representation. We try to do the same thing here although unlike them, this representation is not essential for the development of our identiÖcation scheme. Suppose Yd is continuous, and the  th conditional quantile of Yd given X and V is q(d; X; V;  ); then we can represent Yd = q(d; X; V; Rd)

by the Skorohod representation, where Rdj(X; V )  U (0; 1) is the rank variable which represents some unobserved characteristic of Yd, e.g., ability or proneness, among the slice of people with a speciÖc value of X and V. The RP assumption (A6) can be restated as R 1 j(X; V ) = R 0 j(X; V ). We now clarify two key points of the Skorohod representation. First, the Skorohod representation decomposes the information in Ud of (1) into two components: the value information and the rank information. The former is incorporated in the quantile function q() and the later is included in Rd. Second, because Rdj(X; V )  U (0; 1) does not depend on (X; V ), it may be suspected that Rd is independent of (X; V ). This is incorrect. This mistake is immediately clear if we rewrite Yd = q(d; X; V; Rd(X; V )) ; in other words, Rd must be understood as a conditional random variable. Suppose there are N distinct points on the support of (X; V ), and then there are N rank variables Rd(X; V ). Although Rd(X; V )j(X = x; V = v)  U (0; 1) does not depend on (x; v), the unconditional random variable Rd may depend on (X; V ). The RP condition does not restrict the dependence between Rd and (X; V ); rather, it restricts the total number of conditional rank variables

which implies that the joint distribution of Y 1 and Y 0 given X = x; UD = uD is degenerate. To see how this joint distribution looks like, suppose Ydj (X = x; UD = uD ) is continuously distributed and supp(YdjX = x; UD = uD ) = [0; 1] to simplify the discussion. It turns out that only on the line

y 0 ; F (^) Y 11 jX;UD

FY 0 jX;UD (y 0 jx; uD )jx; uD

with y 0 2 [0; 1] there is probability. In other words, only on the Q-Q plot, (Y 0 ; Y 1 ) can occur simultaneously. An implication of this result is that if FY 0 jX;UD (jx; uD ) is the same as FY 1 jX;UD (jx; uD ), then the correla- tion between Y 0 and Y 1 conditional on X = x; UD = uD must be 1. Figure 2 shows a typical Q-Q plot of (Y 0 ; Y 1 ) conditional on X = x; UD = uD. In Figure 2, P (Y 1  Y 0 jY 0 = y 0 ; X = x; UD = uD ) = 1 when y 0  0 : 6 and P (Y 1  Y 0 jY 0 = y 0 ; X = x; UD = uD ) = 0 when y 0 > 0 : 6. In other words, for the slice of people with Y 0 = y 0 ; X = x; UD = uD , the participant always beneÖts as long as y 0  0 : 6 , and vice versa. Nevertheless, it is more likely that P (Y 1  Y 0 jY 0 = y 0 ; X = x) 2 (0; 1), P (Y 1  Y 0 jX = x; UD = uD ) = FY 0 jX;UD (0: 6 jx; uD ) 2 (0; 1) and P (Y 1  Y 0 jX = x) =

R

P (Y 1  Y 0 jX = x; UD = uD )duD 2 (0; 1).

(^00) 0.6 1

1

Figure 2: Q-Q Plot of (Y 0 ; Y 1 ) Conditional on X = x; UD = uD

It should be emphasized that the RP condition is only for deÖning various quantile treatment e§ects. Even without this condition, we can still identify various marginal distributions which, as argued in the introduction, are useful for many other purposes. Under the RP assumption, we deÖne the MQTE in Carneiro and Lee (2009) as

M QT E (x; uD ) = QY 1 jX;UD ( jx; uD ) QY 0 jX;UD ( jx; uD ):

If we strengthen the RP assumption to be conditional on X = x or on X = x; D = 1, then we can deÖne the QTE in Chernozhukov and Hansen (2005, 2006) and the QTT as

QT E (x) = QY 1 jX ( jx) QY 0 jX ( jx)

and QT T (x) = QY 1 jX;D ( jx; 1) QY 0 jX;D ( jx; 1);

respectively. If the RP assumption is conditional on X = x; uD < UD  u^0 D , then the LQTE of Abadie et al. (2002)^2 is deÖned as

LQT E (x; uD ; u^0 D ) = QY 1 jX;UD ( jx; (uD ; u^0 D ]) QY 0 jX;UD ( jx; (uD ; u^0 D ]):

Finally, if the RP assumption holds unconditionally (with respect to X),^3 then we deÖne the integrated QTE (IQTE) IQT E = QY 1 ( ) QY 0 ( );

the integrated QTT (IQTT) IQT T = QY 1 jD ( j1) QY 0 jD ( j1)

as in Firpo (2007),^4 and the integrated LQTE (ILQTE)

ILQT E (uD ; u^0 D ) = QY 1 jUD ( j(uD ; u^0 D ]) QY 0 jUD ( j(uD ; u^0 D ]):

2.2 IdentiÖcation of the MQTE

The following theorem states that the MQTE can be identiÖed for a range of uD.

Theorem 1 Suppose assumptions (A1)-(A6) hold. If uD is not an isolated point of P x^1 \P x^0 , then M QT E (x; uD ) can be identiÖed for any  2 (0; 1), where Pxd =supp(p(X; Z)jX = x; D = d).

Proof. To simplify notations, we depress the conditioning on X = x. Given the RP assumption (A6), we need only identify QYdjUD ( juD ) whose identiÖcation is equivalent to the identiÖcation of FYdjUD (juD ). We provide two methods to identify FYdjUD (juD ). Method 1: Note that

P (Y  yjp(Z) = p; D = 1) p = P (Y 1  yjp(Z) = p; D = 1) P (D = 1jp(Z) = p)

= P (Y 1  yjUD  p) p =

Z (^) p

0

FY 1 jUD (yjuD )duD ;

and similarly, P (Y  yjp(Z) = p; D = 0) (1 p) =

Z 1

p

FY 0 jUD (yjuD )duD , so

d [P (Y  yjp(Z) = p; D = 1) p] dp =^ FY^1 jUD^ (yjp); d [P (Y  yjp(Z) = p; D = 0) (1 p)] dp =^ FY^0 jUD^ (yjp): (^2) Abadie et al. (2002) conáate issues of deÖnition of parameters with issues of identiÖcation; see Section 6.2 below for their deÖnition. Actually, LQT E (x; uD ; u^0 D ) can be deÖned for any uD ; u^0 D 2 (0; 1) although it can only be identiÖed for uD ; uD on the support of p(x; Z). (^3) Note that if the RP assumption holds on X = x, YdjX can be expressed as Yd = q(d; X; U ) by the Skorohod representation, where U jX = U 1 jX = U 0 jX. If the RP assumption holds unconditionally, then Yd can be expressed as Yd = q(d; U ) by the Skorohod representation, where U = U 1 = U 2. This by no means implies that information in X and Z is useless to the identiÖcation or e¢ ciency improvement in the quantile treatment e§ect evaluation. (^4) Be careful about the terminology in the literature. Our IQTE and IQTT are the QTE and QTT of Firpo (2007). Also, the MQTE of Cattaneo (2010) means Q (Y 0 ) and Q (Y 1 ) rather than M QT E (x; uD ), and the MQTE, QTE and QTT in the Örst strand of literature mentioned in the introduction means QY 1 Y 0 jX;UD ( jx; uD ), QY 1 Y 0 jX ( jx) and QY 1 Y 0 jX;D ( jx; 1) rather than M QT E (x; uD ), QT E (x) and QT T (x).

P (DY  yjp(Z) = p) does not include the point mass. This intuition is similar in spirit to that of the censored quantile regression models discussed in Powell (1984, 1986). The arguments in Theorem 1 can be applied to the discrete Yd case. Suppose Y 1 and Y 0 have the same support fy 1 ;    ; yS g, and then the counterpart of the MQTE is PY 1 jUD (ysjuD )PY 0 jUD (ysjuD ), s = 1;    ; S, where PYdjUD (ysjuD ) is the point mass of Ydj (UD = uD ) at ys. We can still identify FYdjUD (ysjp) by (4), (5), (6) and (7), and then PYdjUD (y 1 jp) = FYdjUD (y 1 jp) and PYdjUD (ysjp) = FYdjUD (ysjp) FYdjUD (ys 1 jp) for s = 2;    ; S can be sequentially identiÖed. If Yd can take only 0 and 1, then the parameter of interest is PY 1 jUD (1juD ) PY 0 jUD (1juD ) which coincides with the MTE. Of course, we can also consider the case with mixed discrete and continuous outcomes. Both the discrete case and the mixed case are easier to handle than the continuous case, so we will concentrate on the continuous case in the rest of this paper unless stated otherwise. If we use the idea of LIV as in Heckman and Vytlacil (2001a), we have

P (Y  yjp(Z) = p) = P (Y  yjp(Z) = p; D = 1) p + P (Y  yjp(Z) = p; D = 0) (1 p)

=

Z (^) p

0

FY 1 jUD (yjuD )duD +

Z 1

p

FY 0 jUD (yjuD )duD ;

and @P (Y  yjp(Z) = p) @p =^ FY^1 jUD^ (yjp)^ ^ FY^0 jUD^ (yjp);

which is the di§erence of CDFs in the two treatment statuses. So it is hard to identify the MQTE from @P (Y  yjp(Z) = p) =@p. From Theorem 1, we can identify E[Y 1 jUD = p] and E[Y 0 jUD = p] separately, not just their di§erence E[Y 1 Y 0 jUD = p] as in the LIV method of Heckman and Vytlacil (2001a). Method 1 of the proof is a special case of Theorem 1 in Carneiro and Lee (2009). We also discuss Method 2 to distinguish the di§erence between the identiÖcation scheme of the MTE and the MQTE. For

the MTE, E[DY jp(Z) = p] = E [Y jp(Z) = p; D = 1] p =

Z (^) p

0

E [Y 1 jUD = uD ] duD , and E[(1 D) Y jp(Z) =

p] = E [Y jp(Z) = p; D = 0] (1 p) =

Z 1

p

E [Y 0 jUD = uD ] duD , so the two methods in the proof are the same

in the MTE identiÖcation. We close this subsection by a concrete example. Suppose Y 1 = V +2U; Y 0 = 2V +U , and D = 1(ZV > 0), where (^0)

B@

U

V

Z

CA  N (0; ) with  =

B@

CA :

It can be shown that M QT E (uD ) = 0 :5^1 (uD ) +

p 0 :75^1 ( ). Figure 4 shows M QT E (uD ) for  = 0: 1 ; 0 : 25 ; 0 : 5 ; 0 : 75 and 0 : 9. In this simple model, the spreading measure of the MQTE, e.g., M QT E 1  (uD ) M QT E (uD ) for  2 (0; 0 :5), is the same for any uD , which may not be standard in practice. Also, M QT E (uD ) is a decreasing function of p, which indicates that the more likely will an individual par- ticipate in the program, the higher beneÖt will she receive.^5 In the Ögure, we also show M T E^ (uD ), QT E and AT E^ ( E[Y 1 ] E[Y 0 ]) for comparison. Note that in this example, M T E^ (uD ) = M QT E: 5 (uD ), and QT E = 0 = AT E^ does not depend on .^6 Obviously, M QT E (uD ) provides more information than M T E^ (uD ), QT E , and AT E^.

(^5) Aakvik et al. (2005) provide a converse example. (^6) It should be emphasized that QT E is not well deÖned in this example since the RP condition does not hold unconditionally given that Y 1 and Y 0 have the same marginal distribution but Corr(Y 1 ; Y 0 ) = 6: 5 = 7 < 1.

-2.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-1.

-0.

0

1

2

Figure 4: M QT E (uD ) for  = 0: 1 ; 0 : 25 ; 0 : 5 ; 0 : 75 and 0 : 9 in a Simple Example

2.3 Relationship with Other Parameters of Treatment E§ects

In this subsection, we Örst discuss the relationship between M QT E (x; uD ) and QT T (x), QT E (x), LQT E (x; uD ; u^0 D ), IQT E , IQT T , ILQT E. It turns out that the building block is FYdjX;UD (ydjx; uD ) rather than M QT E (x; uD ). Actually, M QT E (x; uD ) is more relevant to the (conditional) quantile of Y 1 Y 0. From the supplementary materials, we can show that

QT T (x) = F (^) Y 11 jX;D ( jx; 1) F (^) Y 01 jX;D ( jx; 1)

and the quantile treatment e§ect on the untreated (QTUT)

QT U T (x) = F (^) Y 11 jX;D ( jx; 0) F (^) Y 01 jX;D ( jx; 0);

where

FYdjX;D (ydjx; 1) =

Z 1

0

FYdjX;UD (ydjx; uD )hT T (x; uD )duD ;

FYdjX;D (ydjx; 0) =

Z 1

0

FYdjX;UD (ydjx; uD )hT U T (x; uD )duD ;

with hT T (x; uD ) =

1 Fp(X;Z)jX (uD jx)

=E[p(X; Z)jX = x] and hT U T (x; uD ) = Fp(X;Z)jX (uD jx)=E[1 p(X; Z)jX = x]. Also,

QT E (x) = F (^) Y 11 jX ( jx) F (^) Y 01 jX ( jx); IQT E = F (^) Y 11 ( ) F (^) Y 01 ( ); IQT T = F (^) Y 11 jD ( j1) F (^) Y 01 jD ( j1); IQT U T = F (^) Y 11 jD ( j0) F (^) Y 01 jD ( j0)

by a similar derivation as in the expression of FYdjD (yj1), so QY 1 Y 0 jD ( j1) can be identiÖed and

P (Y 1 > Y 0 jD = 1) = 1 P (Y 1 Y 0  0 jD = 1)

can also be identiÖed.^8 Actually, we can identify any conditional or unconditional quantile of Y 1 Y 0 of interest, e.g., QY 1 Y 0 jX;D ( jx; d), QY 1 Y 0 jX ( jx), QY 1 Y 0 jD ( jd), QY 1 Y 0 ( ) and QY 1 Y 0 jX;UD ( jx; (u^0 D ; uD ]), based on M QT E (x; uD ). Since the corresponding weights can be similarly deÖned as above, we neglect the details. Note that if only assumption (A6) holds, P (Y 1 > Y 0 jD = 1) need not equal

R R 1

QT T (x) > 0)d dFXjD (xj1)

or

R 1

IQT T > 0)d. They are equal only if the RP assumption holds on X = x; D = 1 or D = 1. This

observation can be used to test whether the RP assumption holds on a larger set than X = x; UD = uD. Because quantile is not a linear operator of the distribution function, QY 1 Y 0 () and QY 1 () QY 0 () are generally unequal (and do not have any identiÖable relationships), so the quantile treatment e§ect and the quantile of the impact distribution are two di§erent parameters. On the contrary, since mean is a linear operator of the distribution function, the average treatment e§ect and the average of the impact distribution are the same parameter. In this paper, we concentrate on three most popular quantile treatment e§ect pa- rameters in the literature: M QT E (x; uD ), QT E (x) and IQT E. We concentrate on di§erence of quantiles rather than quantile of di§erences because the latter may not be interesting. For example, in the common e§ect model, the distribution of Y 1 Y 0 is a point mass at a Öxed value. Even if the treatment e§ect is not common, Y 1 Y 0 may still have discrete components in its distribution. See Section 3.2 of Aakvik et al. (2005) for deÖnitions of the distributional counterparts of the MTE, ATE and ATT based on Y 1 Y 0 when the outcomes are binary, and see Section 2 of Abbring and Heckman (2007) for deÖnitions of the distributional treatment e§ects in more general settings. Finally, we study FY 1 Y 0 jY 0 ;D (jy 0 ; 1). We have already shown in Section 2.1 that under the RP assump- tion (A6),

P (Y 1 Y 0  yjY 0 = y 0 ; X = x; UD = uD ) = 1(QY 1 jX;UD (FY 0 jX;UD (y 0 jx; uD )jx; uD )  y + y 0 );

so

P (Y 1 Y 0  yjY 0 = y 0 ; D = 1) = P (Y 1 Y 0  yjY 0 = y 0 ; UD  p(X; Z))

R R h^1 FUD jU 0 ;X (p(x;z)jFY 0 jX (y 0 jx);x)

R (^) p(x;z) 0 P^ (Y^1 ^ Y^0 ^ yjY^0 =^ y^0 ; X^ =^ x; UD^ =^ uD^ )dFUD jU 0 ;X^ (pjFY 0 jX^ (y^0 jx); x)

i

dFZjX (zjx)dFXjY 0 (xjy 0 );

where the equality is from the fact that F(UD ;X;Z)jY 0 =y 0 = FUD jY 0 =y 0 ;X;Z FZjY 0 =y 0 ;X FXjY 0 =y 0 = FUD jU 0 =FY 0 jX (y 0 jX);X  FZjX FXjY 0 =y 0 , and U 0 is deÖned in the Skorohod representation of Y 0 , Y 0 jX = F (^) Y 01 jX (U 0 jX). So this parame- ter is a complicated functional of QY 1 jX;UD (FY 0 jX;UD (y 0 jx; uD )jx; uD ) and is not easy to estimate. Actually, it is unknown whether it can be point identiÖed since FUD jU 0 ;X is hard to be nonparametrically identiÖed without further structures on the model.

3 Sharp Bounds for the QTE

Although QYdjX;UD ( jx; uD ) can be point identiÖed from Theorem 1, we show in this section that QYdjX ( jx) generally can only be partially identiÖed, which implies that QT E (x) can only be partially identiÖed. Here, we implicitly assume that the RP assumption on X = x holds (i.e., Yd can be represented as Yd = q(d; X; Ud)

(^8) This parameter is useful, e.g., in the median-voter model, we need to check whether P (Y 1 > Y 0 jD = 1)P (D = 1) > 1 = 2.

with U 0 j(X = x) = U 1 j(X = x)), but we do not explicitly explore the information content in this assumption.^9 First, we impose the quantile independence assumption (QIA),

QYdjX;Z ( jX; Z) = QYdjX ( jX) for all  2 (0; 1). (8)

This assumption is equivalent to (Y 1 ; Y 0 )? ZjX. This assumption is parallel to the usual IV assumption E[YdjX; Z] = E[YdjX] in the average treatment e§ect evaluation. As in Heckman and Vytlacil (2001b), we assume further that D = 1(p(X; Z)  UD ) and Z? UD jX (9)

to study the improvement on the bounds for QT E (x).

3.1 Bounds Under the Quantile Independence Assumption

From Proposition 2 and (36) of Manski (1994), we have sharp bounds for QYdjX ( jx) under (8):

sup z2Zx

L^1  (x; z)  QY 1 jX ( jx)  inf z2Zx R^1  (x; z) ;

sup z2Zx

L^0  (x; z)  QY 0 jX ( jx)  (^) zinf2Z x R^0  (x; z) ;

where Zx supp(ZjX = x),

L^1  (x; z) =

QY jX;Z;D

1 (^) p^1 (x;z) x; z; 1

if p(x; z) > 1 ; otherwise,

R^1  (x; z) =

QY jX;Z;D

 p(x;z) jx; z;^1

if p(x; z)  ; otherwise,

L^0  (x; z) =

QY jX;Z;D

1 (^1) ^1 p(x;z ) x; z; 0

if p(x; z) < ; otherwise,

R^0  (x; z) =

QY jX;Z;D

 1 p(x;z) x; z;^0

if p(x; z)  1 ; otherwise.

So IL (x)  sup z2Zx

L^1  (x; z) (^) zinf2Z x R^0  (x; z)  QT E (x)  (^) zinf2Z x R^1  (x; z) sup z2Zx

L^0  (x; z)  IU (x): (11)

This bound is trivial, since IL (x) = 1 and IU (x) = 1 if Y 1 and Y 0 are unbounded. Similar phenomena also happen in the average treatment e§ect evaluation. To avoid such trivial results, we assume that

P

yld(x)  Yd  yud (x)jX = x; Z

where yld(x); ydu (x) 2 R does not depend on Z from (8). To simplify notations, we assume that y 0 l(x) = y 1 l(x), denoted as yl(x), and y 0 u (x) = yu 1 (x), denoted as yu(x). Then 1 in (10) is changed to yl(x) and 1 is changed to yu(x). Let Px supp(p(X; Z)jX = x), psup x = sup Px and pinf x = inf Px. The width of the bounds is IU (x) IL (x), a complicated expression to evaluate, especially if Zx is uncountable. Note that the above bounds exactly identify QT E (x) if IL (x) = IU (x). Note also that it is neither necessary nor su¢ cient for p(x; z) (^9) In Section 6.1, we will show how Chernozhukov and Hansen (2005) point identify QYdjX ( jx) by exploring the information content in this assumption and imposing some completeness conditions.

where

IL(x) = sup z2Zx

p(x; z)E [Y 1 jX = x; Z = z; D = 1] + (1 p(x; z))yl(x; z)

(^) zinf2Z x f(1 p(x; z)) E [Y 0 jX = x; Z = z; D = 0] + p(x; z)yu(x; z)g ;

IU^ (x) = inf z2Zx fp(x; z)E [Y 1 jX = x; Z = z; D = 1] + (1 p(x; z))yu(x; z)g

sup z2Zx

(1 p(x; z)) E [Y 0 jX = x; Z = z; D = 0] + p(x; z)yl(x; z) ;

and yl(x; z), yu(x; z) 2 R satisfy P

yl(x; z)  Yd  yu(x; z)jX = x; Z = z

= 1. Note here that yl(x; z) and yu(x; z) depend on z if only the mean independence assumption, E[YdjX; Z] = E[YdjX], is imposed. As in Theorem 2(i), when pinf x and psup x are achieved at some values that Z can take, pinf x = 0 and psup x = 1 implies that E [Y 1 jX = x; Z = z; D = 1] E [Y 0 jX = x; Z = z; D = 0]  IL(x)  IU^ (x)  E [Y 1 jX = x; Z = z; D = 1] E [Y 0 jX = x; Z = z; D = 0] ;

so AT E^ (x) is point identiÖed. Corollary 2 of Proposition 6 in Manski (1994) implies that when (Y 1 ; Y 0 )? Dj (X; Z), AT E^ (x) is point identiÖed using his bound (35) or [IL(x); IU^ (x)] above if and only if psup x = 1 and pinf x = 0. Our result parallels his result when Ydj (X = x) is continuously distributed with a positive density on (yl(x); yu(x)). It should be emphasized that when (Y 1 ; Y 0 )? Dj (X; Z), psup x = 1 and pinf x = 0 is necessary for point identiÖcation of QT E (x) only when (11) is used. Actually, since QYdjX ( jx) = QYdjX;Z ( jx; z) = QY jX;Z;D ( jx; z; d), QYdjX ( jx) can be identiÖed directly from QY jX;Z;D( jx; z; d).

3.2 Bounds Under the Nonparametric Selection Model

The following theorem states the bounds for QT E (x) when assumption (9) is imposed.

Theorem 3 Suppose assumptions (8), (9) and (12) hold.

(i) QT E (x) has sharp bounds,

L^1  (x) R^0  (x)  QT E (x)  R^1  (x) L^0  (x) ; (13)

where L^1  (x) =

QY jX;p(X;Z);D

1 (^1) psup x  x; psup x ; 1

yl(x);

if psup x > 1 ; otherwise,

R^1  (x) =

QY jX;p(X;Z);D

 psup x x; p

sup x ;^1

yu(x);

if psup x  ; otherwise,

L^0  (x) =

QY jX;p(X;Z);D

1 (^11) pinf x x; pinf x ; 0

yl(x);

if pinf x < ; otherwise,

R^0  (x) =

QY jX;p(X;Z);D

 1 pinf x^ x; p inf x ; 0

yu(x);

if pinf x  1 ; otherwise.

(ii) pinf x = 0 and psup x = 1 are su¢ cient for point identiÖcation of QT E (x) for any Öxed  2 (0; 1). When Y j (X = x; p(X; Z) = psup x ; D = 1) and Y j

X = x; p(X; Z) = pinf x ; D = 0

are continuously distributed with a positive density on (yl(x); yu(x)), they are also necessary.

(iii) [IL (x); IU (x)] in (11) will simplify to the bounds in (13) under assumption (9).

Figure 5: Intuition for L^1   QY 1 ( )  R^1  : psup x = 0: 8 ,  = 0: 5 ,  1 = 0: 15 ,  2 = 0: 46 and  3 = 0: 91

Figure 5 provides some intuition for why L^1  (x)  QY 1 jX ( jx)  R^1  (x); similar intuition can be applied to the bounds for QY 0 jX ( jx). From the proof of Theorem 3,

P (Y  yjp(Z) = psup x ; D = 1) psup x  P (Y 1  y)  P (Y  yjp(Z) = psup x ; D = 1) psup x + (1 psup x );

where the conditioning on X = x is depressed. Suppose (Y 1 ; V )  N 0 ;

; then

P (Y  yjp(Z) = psup x ; D = 1) psup x =

Z (^) psup x

0

y p ^1 (uD ) 1 ^2

duD :

Figure 5 shows the bounds for P (Y 1  y) when psup x = 0: 8 and  = 0: 5. Inverting the bounds for P (Y 1  y), we can get the bounds for QY 1 ( ). When   1 psup x , L^1  = yl; when  > psup x , R ^1 = yu. Only if  2 (1psup x ; psup x ), both bounds are nontrivial. This is not always possible; only if psup x > max(; 1  )  1 = 2 (pmin x < min(; 1  )), neither the left nor the right bound for QY 1 ( ) (QY 0 ( )) is trivial. Pushing ! 0 or 1 , we can see that there are nontrivial bounds for QY 1 ( ) (QY 0 ( )) for all  if and only if psup x = 1 (pmin x = 0). Note that Ld (x) and Rd (x) are increasing functions of  ; hence the bound for QYdjX ( jx) shifts to the right as  increases. Also observe that

1

psup x^ ^ ^ ^

psup x^ and^1 ^

1 pinf x^ ^ ^ ^

pinf x^ :

Hence QY jX;p(X;Z);D ( jx; psup x ; 1) and QY jX;p(X;Z);D

 jx; pinf x ; 0

lie within the bound for QY 1 jX ( jx) and QY 0 jX ( jx), respectively. This implies that FY 1 jX (jx) = FY jX;p(X;Z);D (jx; psup x ; 1) and FY 0 jX (jx) = FY jX;p(X;Z);D (jx; pinf x ; 0) are not rejectable in the absence of other information.

Figure 6: psup x (pinf x ) and psup x 1 (pinf x 0 ) for Point IdentiÖcation of QYdjX ( jx): Red Area for QYdjX ( jx) = 1 and Blue Area for QYdjX ( jx) = 0

The next example shows that [IL (x); IU (x)] in (11) may not simplify to the bounds in Theorem 3 if assumption (9) is not imposed. This example parallels the example in Section 6 of Heckman and Vytlacil (2001b) where they show a similar result for AT E^ (x).

Example 2 Suppose Z is binary and there are no other covariates. Take (^) zinf2Z x R^1  (x; z) as an example;

suppose yl(x) = 0, yu(x) = 1 and p(1)  p(x; 1) > p(x; 0)  p(0). We want to show that it is possible to have

min

QY jZ;D

p(1)

1(p(1)   ) + 1(p(1) <  ); QY jZ;D

p(0)

1(p(0)   ) + 1(p(0) <  )

= QY jZ;D

p(0)

1(p(0)   ) + 1(p(0) <  ) < QY jZ;D

p(1)

1(p(1)   ) + 1(p(1) <  ):

We must assume min fp(0); p(1)g   to make this result hold. If min fp(0); p(1)g   , we need only check

q 1  QY jZ;D

p(1)

> QY jZ;D

p(0)

 q 0 :

First, the QIA needs to be satisÖed. Without loss of generality, assume Y 1 jZ is uniformly distributed. Then the QIA is satisÖed if

FY 1 jZ (y 1 j0) = FY 1 jZ;D (y 1 j 0 ; 0)(1 p(0)) + FY 1 jZ;D (y 1 j 0 ; 1)p(0) = y 1 ; FY 1 jZ (y 1 j1) = FY 1 jZ;D (y 1 j 1 ; 0)(1 p(1)) + FY 1 jZ;D (y 1 j 1 ; 1)p(1) = y 1 ;

for any y 1 2 [0; 1]. As long as

FY 1 jZ;D (q 0 j 0 ; 0) = q 0  1 p(0) 2 (0; 1) or  < q 0 <  + (1 p(0));

FY 1 jZ;D (q 1 j 1 ; 0) = q 1  1 p(1) 2 (0; 1) or  < q 1 <  + (1 p(1));

we can Önd qualiÖed FY 1 jZ;D (y 1 jz; d), z = 0; 1 , d = 0; 1 such that (14) is satisÖed. For example, let

FY 1 jZ;D (y 1 j 0 ; 0) = q 0  (1 p(0)) q 0 y 1 1(y 1  q 0 ) +

q 0  1 p(0) ^ q^0 1 q 0

1 1 q^0 p(0) 1 q 0 y 1

1(y 1 > q 0 );

FY 1 jZ;D (y 1 j 0 ; 1) =

p(0)q 0 y 1 1(y 1  q 0 ) +

 p(0) ^ q^0 1 q 0

1 (^) p(0) 1 q 0 y 1

1(y 1 > q 0 );

FY 1 jZ;D (y 1 j 1 ; 0) = q 1  (1 p(1)) q 1 y 1 1(y 1  q 1 ) +

q 1  1 p(1) ^ q^1 1 q 1

1 1 q^1 p(1) 1 q 1 y 1

1(y 1 > q 1 );

FY 1 jZ;D (y 1 j 1 ; 1) =

p(1)q 1 y 1 1(y 1  q 1 ) +

 p(1) ^ q^1 1 q 1

1 (^) p(1) 1 q 1 y 1

1(y 1 > q 1 ):

Figure 7 shows the case with  = 0: 5 ; p(0) = 0: 6 ; p(1) = 0: 7 ; q 0 = 0: 65 < 0 :75 = q 1. 

(^0) 0.65 1

1

(^01)

1

(^0) 0.5 1

1

(^0) 0.75 1

1

(^01)

1

(^0) 0.5 1

1

Figure 7: An Illustration of inf z2Zx R^1  (x; z) 6 = R^1  (x) When (9) is NOT SatisÖed:  = 0: 5

It is useful to construct a test to check the hypothesis that the bounds [IL (x); IU (x)] and those in Theorem 3(i) coincide. Since IL (x)  L^1  (x) R^0  (x) and IU (x)  R^1  (x) L^0  (x) always hold, our null hypothesis is L^1  (x) R^0  (x) IL (x)  0 , and IU (x)

R^1  (x) L^0  (x)