




































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The quantile treatment effect (QTE) and its estimation in econometrics, including the identification of QTE under various assumptions, the use of instrumental variables, and the derivation of the marginal policy relevant quantile treatment effect (MPRQTE). It also covers the moment conditions for identification and the unconfoundedness case.
Typology: Schemes and Mind Maps
1 / 76
This page cannot be seen from the preview
Don't miss anything!





































































Abstract This paper studies estimation and inference based on the marginal quantile treatment e§ect. First, we illustrate the importance of the rank preservation assumption in the quantile treatment e§ects evaluation, show the identiÖability of the marginal quantile treatment e§ect, and clarify the relationship between the marginal quantile treatment e§ect and other quantile treatment parameters. Second, we develop sharp bounds for the quantile treatment e§ect with and without the monotonicity assumption, and also su¢ cient and necessary conditions for point identiÖcation. Third, we estimate the marginal quantile treatment e§ect and associated quantile treatment e§ect and integrated quantile treatment e§ect based on the distribution regression, derive the corresponding weak limits and show the validity of the bootstrap inferences. The inference procedure can be used to construct uniform conÖdence bands for quantile treatment parameters and test unconfoundedness and stochastic dominance. We also develop goodness of Öt tests to choose regressors in the distribution regression. Fourth, we conduct two counterfactual analyses: deriving the transition matrix and developing the relative marginal policy relevant quantile treatment e§ect parameter under the policy invariance. Fifth, we compare the identiÖcation schemes in some important literature with that by the marginal quantile treatment e§ect, and point out advantages and also weaknesses of each scheme, e.g., Chernozhukov and Hansen (2005) concentrate mainly on the quantile treatment e§ect with the selection select but without the essential heterogeneity; Abadie, Angrist and Imbens (2002), Aakvik, Heckman and Vytlacil (2005) and Chernozhukov and Hansen (2006) su§er from some obvious misspeciÖcation problems. Meanwhile, an alternative estimator of the local quantile treatment e§ect is developed and its weak limit is derived. Finally, we apply the estimation methods to the famous return to schooling dataset of Angrist and Krueger (1991) to illustrate the usefulness of the techniques developed in this paper to practitioners.
Keywords: marginal quantile treatment e§ect, local quantile treatment e§ect, rank preservation, se- lection e§ect, essential heterogeneity, sharp bound, point identiÖcation, distribution regression, two-step estimator, Hadamard di§erentiability, weak limit, uniform conÖdence band, unconfoundedness, com- pleteness, stochastic dominance, goodness of Öt test, transition matrix, relative marginal policy relevant quantile treatment e§ect, counterfactual analysis, policy invariance, bootstrap validity, return to school- ing JEL-Classification: C12, C13, C14, C21, C
Email: [email protected].
Treatment e§ect evaluation is one main task of econometric analysis. Most literature concentrates on the average treatment e§ect evaluation; see Heckman and Vytlacil (2007a,b) for a comprehensive summary. Meanwhile, as illustrated in Heckman (1992), Heckman et al. (1997) and Heckman and Smith (1993, 1998), questions of political economy or "social justice" requires knowledge of the distribution of the treatment e§ect. As a result, distributional treatment e§ects (especially when unconfoundedness does not hold) become natural parameters of interest among econometricians. Actually, distributional treatment e§ects have been studied extensively in the empirical literature. For example, Card (1996) uses a panel data set to study the e§ects of unions on the structure of wages; DiNardo et al. (1996) presents a semiparametric procedure to analyze the e§ects of institutional and labor market factors on changes in the U.S. distribution of wages; Bitler et al. (2006) estimate quantile treatment e§ects using random-assignment data from Connecticutís Job First waiver. Distributional treatment e§ects are usually estimated based on quantile regression initiated by Koenker and Bassett (1978) (see Koenker (2005) for an introduction to quantile regression). One related Öeld that recently attracts much attention is the "general" semiparametric and nonparametric quantile regression with endogeneity. For the semiparametric setups, see, e.g, Hong and Tamer (2003), HonorÈ and Hu (2004), Ma and Koenker (2006), Lee (2007), Sakata (2007) and Jun (2008) among others. For nonparametric setups, see, e.g., Chesher (2003), Chernozhukov et al. (2007), Horowitz and Lee (2007), Imbens and Newey (2009), Chen and Pouzo (2012), and Gagliardini and Scaillet (2012) among others. However, the main interest of this paper concentrates on the special structure of the treatment model, namely, the endogenous variable is binary. A key parameter we will develop is the marginal quantile treatment e§ ect (MQTE), which is the counterpart of the marginal treatment e§ect (MTE) in the average treatment e§ect estimation. The idea of the MTE was Örst introduced in the context of a parametric normal generalized Roy model by Bjˆrklund and Mo¢ tt (1987), and was analyzed more generally by Heckman (1997). In a choice (or selection, or participation) model with the latent variable structure, Heckman and Vytlacil (1999, 2001a) express the conventional average treatment e§ect parameters as di§erent weighted averages of the MTE, and also identify the MTE by the local instrumental variable (LIV) estimator. Actually, Heckman and Vytlacil (2007b) use the MTE to unify the econometric literature on the evaluation of social programs, so it is well recognized that the MTE is a convenient tool to organize the nonparametric literature on the average treatment e§ect evaluation. An embarrassing situation is that the counterpart of the MTE in the quantile treatment e§ect literature, the MQTE, is yet to be well understood. The purpose of this paper is to integrate the relevant literature on the quantile treatment e§ect evaluation without unconfoundedness into one framework and provide some useful estimation and inference methods to practitioners based on the MQTE. There are two strands of literature concerning about the distributional treatment e§ects, and they are interwined. Before reviewing the relevant literature, we must emphasize that the distributional treatment e§ects are functionals of the distribution of Y 1 Y 0 , which requires the joint distribution of Y 1 and Y 0 , where Y 1 and Y 0 are the outcome under the treatment status and the control status, respectively. As mentioned in Section II.B of Manski (1996) or footnote 5 of Manski (1997), "knowledge of F (Y 1 Y 0 ) neither implying nor being implied by knowledge of F (Y 1 ) and F (Y 0 )", where F (X) is the cumulative distribution function (CDF) of X for a random variable X. Due to the fundamental problem of causal inference (page 947 of Holland (1986)), Y 0 and Y 1 cannot be observed simultaneously. As a result, even in a random experiment, the joint distribution F (Y 1 ; Y 0 ) or F (Y 1 Y 0 ) cannot be identiÖed if without further restrictions although F (Y 1 ) and F (Y 0 ) can be identiÖed. On the other hand, marginal distributions F (Y 1 ) and F (Y 0 ) are also of interest in econometric analysis. For example, in Atkinson (1970), Sen (1997, 2000), Manski (1996, p714),
of literature to identify the (conditional) marginal distributions of potential outcomes. These distributions imply the MQTE, which is also the main objective of this paper but we do not need the independence assumption. The above-mentioned literature concentrates on the cross-sectional data; Athey and Imbens (2006) also use the panel data to identify the QTT through what they called change-in-change approach under the RP condition on the treated. Although these two strands of literature use di§erent identiÖcation assumptions, their targets are the same, namely, identifying the joint distribution of Y 1 and Y 0. This paper can be put in the second strand of literature, i.e., we impose some RP assumptions to identify F (Y 1 ; Y 0 ). Consequently, the quantile treatment e§ect in this paper refers to the di§erence of quantiles rather than the quantile of di§erences. Meanwhile, we employ the framework in the Örst strand of literature to study the di§erence of quantiles. The rest of this paper is structured as follows. Section 2 sets up our treatment model, illustrates the importance of the RP assumption in the quantile treatment e§ect evaluation, shows the identiÖability of the MQTE, and clariÖes the relationship between the MQTE and other quantile treatment parameters. Section 3 develops sharp bounds and su¢ cient and necessary conditions for point identiÖcation of the QTE with and without the monotonicity assumption. In Section 4, we estimate the MQTE based on the distribution regression introduced by Foresi and Peracchi (1995), derive its weak limit and show the validity of the bootstrap inferences, and we also develop goodness of Öt tests to choose regressors. In Section 5, we conduct two counterfactual analyses: deriving the transition matrix and developing the relative marginal policy relevant quantile treatment e§ect parameter under the policy invariance. In Section 6, we comment some key literature in the two strands above, pointing out their weaknesses, underlying assumptions, and interactions with this paper. Section 7 presents an empirical application to the return to schooling and Section 8 concludes. All proofs are contained in an appendix. Some notations are collected here for future reference. d is always used for indicating the two treatment statuses, so is not written out explicitly as "d = 0; 1 " throughout the paper. supp(X) for a random variable X denotes the support of the distribution of X. Both QX ( ) and Q (X) denote the th quantile of a random variable X. The capital letters such as X denote random variables and the corresponding lower case letter such as x denote the potential values they may take. For any parameter , d is the dimension of . The space ^1 (F) represents the space of real-valued bounded functions deÖned on the index set equipped with the supremum norm kk (^1) (F). C (Y) is the space of continuous functions on Y.
We use the nonlinear and nonseparable outcome model as in Heckman and Vytlacil (2005),
Y 1 = 1 (X; U 1 ); Y 0 = 1 (X; U 0 ):
Actually, the additively separable setup, Yd = d(X) + Ud, does not lose generality since we can deÖne the new Ud as Yd QYdjX ( jX) and all our analysis in this paper is conditional on X. The distribution of Yd may be discrete (e.g., employment status), continuous (e.g., wage), or mixed discrete and continuous (e.g., in the national JTPA study 18 month impact sample used in Heckman et al. (1997), a substantial proportion of persons has zero earnings in both distributions of Y 0 and Y 1 ). The participation decision
D = 1(D (X; Z) V 0); (2)
where Z includes the instruments for the choice process. Both X and Z appearing as the arguments of D does not lose generality since D (X; Z) may not depend on all elements of X. By transforming D (X; Z) and V by FV jX;Z , we can rewrite D = 1(p(X; Z) UD 0); (3)
where UD jX; Z U (0; 1) and p(X; Z) is the propensity score. We use these two formulations of D inter- changeably throughout the paper. As shown in Vytlacil (2006), there is a larger class of latent index models that will have a representation of this form. Also, this setup of D implies the monotonicity assumption of Imbens and Angrist (1994) as shown in Vytlacil (2002). We impose the following assumptions on the outcome equation and the choice equation. (A1) D (X; Z) is a nondegenerate random variable conditional on X. (A2) The random vectors (U 1 ; V ) and (U 0 ; V ) are independent of Z conditional on X. (A3) The distribution of V is absolutely continuous with respect to Lebesgue measure. (A4) X 1 = X 0 almost everywhere, where Xd denote a value of X if D is set to d. (A5) 1 > P (D = 1jX) > 0. (A6) Conditional on X = x, V = v, Y 0 and Y 1 have the same rank:
(A1)-(A5) corresponds to (A-1)-(A-3), (A-6) and (A-5) in Heckman and Vytlacil (2005), respectively. These assumptions are prevalent in the literature with heterogeneous treatment e§ects. A necessary condition for (A1) is that Z contains a continuous variable. (A2) allows for both the selection e§ect (U 0 6? DjX) and the essential heterogeneity ((U 1 U 0 ) 6? DjX). Also, (A2) implies the usual assumption in the control function approach, say, Z? (U 1 ; U 0 )j (X; V ). (A1)-(A5), combined with (1) and (2), impose testable restrictions on the distribution of (Y; D; Z; X); see Heckman and Vytlacil (2005) (page 678) for the index su¢ ciency restriction and the monotonicity restriction. We refer to Heckman and Vytlacil (2005) for more detailed discussions on (A1)-(A5). The assumption (A6) deserves further examination.
The key extra assumption beyond those in Heckman and Vytlacil (2005) is the RP condition (A6). Cher- nozhukov and Hansen (2005) state the RP assumption via the Skorohod representation. We try to do the same thing here although unlike them, this representation is not essential for the development of our identiÖcation scheme. Suppose Yd is continuous, and the th conditional quantile of Yd given X and V is q(d; X; V; ); then we can represent Yd = q(d; X; V; Rd)
by the Skorohod representation, where Rdj(X; V ) U (0; 1) is the rank variable which represents some unobserved characteristic of Yd, e.g., ability or proneness, among the slice of people with a speciÖc value of X and V. The RP assumption (A6) can be restated as R 1 j(X; V ) = R 0 j(X; V ). We now clarify two key points of the Skorohod representation. First, the Skorohod representation decomposes the information in Ud of (1) into two components: the value information and the rank information. The former is incorporated in the quantile function q() and the later is included in Rd. Second, because Rdj(X; V ) U (0; 1) does not depend on (X; V ), it may be suspected that Rd is independent of (X; V ). This is incorrect. This mistake is immediately clear if we rewrite Yd = q(d; X; V; Rd(X; V )) ; in other words, Rd must be understood as a conditional random variable. Suppose there are N distinct points on the support of (X; V ), and then there are N rank variables Rd(X; V ). Although Rd(X; V )j(X = x; V = v) U (0; 1) does not depend on (x; v), the unconditional random variable Rd may depend on (X; V ). The RP condition does not restrict the dependence between Rd and (X; V ); rather, it restricts the total number of conditional rank variables
which implies that the joint distribution of Y 1 and Y 0 given X = x; UD = uD is degenerate. To see how this joint distribution looks like, suppose Ydj (X = x; UD = uD ) is continuously distributed and supp(YdjX = x; UD = uD ) = [0; 1] to simplify the discussion. It turns out that only on the line
y 0 ; F (^) Y 11 jX;UD
FY 0 jX;UD (y 0 jx; uD )jx; uD
with y 0 2 [0; 1] there is probability. In other words, only on the Q-Q plot, (Y 0 ; Y 1 ) can occur simultaneously. An implication of this result is that if FY 0 jX;UD (jx; uD ) is the same as FY 1 jX;UD (jx; uD ), then the correla- tion between Y 0 and Y 1 conditional on X = x; UD = uD must be 1. Figure 2 shows a typical Q-Q plot of (Y 0 ; Y 1 ) conditional on X = x; UD = uD. In Figure 2, P (Y 1 Y 0 jY 0 = y 0 ; X = x; UD = uD ) = 1 when y 0 0 : 6 and P (Y 1 Y 0 jY 0 = y 0 ; X = x; UD = uD ) = 0 when y 0 > 0 : 6. In other words, for the slice of people with Y 0 = y 0 ; X = x; UD = uD , the participant always beneÖts as long as y 0 0 : 6 , and vice versa. Nevertheless, it is more likely that P (Y 1 Y 0 jY 0 = y 0 ; X = x) 2 (0; 1), P (Y 1 Y 0 jX = x; UD = uD ) = FY 0 jX;UD (0: 6 jx; uD ) 2 (0; 1) and P (Y 1 Y 0 jX = x) =
P (Y 1 Y 0 jX = x; UD = uD )duD 2 (0; 1).
(^00) 0.6 1
1
Figure 2: Q-Q Plot of (Y 0 ; Y 1 ) Conditional on X = x; UD = uD
It should be emphasized that the RP condition is only for deÖning various quantile treatment e§ects. Even without this condition, we can still identify various marginal distributions which, as argued in the introduction, are useful for many other purposes. Under the RP assumption, we deÖne the MQTE in Carneiro and Lee (2009) as
M QT E (x; uD ) = QY 1 jX;UD ( jx; uD ) QY 0 jX;UD ( jx; uD ):
If we strengthen the RP assumption to be conditional on X = x or on X = x; D = 1, then we can deÖne the QTE in Chernozhukov and Hansen (2005, 2006) and the QTT as
QT E (x) = QY 1 jX ( jx) QY 0 jX ( jx)
and QT T (x) = QY 1 jX;D ( jx; 1) QY 0 jX;D ( jx; 1);
respectively. If the RP assumption is conditional on X = x; uD < UD u^0 D , then the LQTE of Abadie et al. (2002)^2 is deÖned as
LQT E (x; uD ; u^0 D ) = QY 1 jX;UD ( jx; (uD ; u^0 D ]) QY 0 jX;UD ( jx; (uD ; u^0 D ]):
Finally, if the RP assumption holds unconditionally (with respect to X),^3 then we deÖne the integrated QTE (IQTE) IQT E = QY 1 ( ) QY 0 ( );
the integrated QTT (IQTT) IQT T = QY 1 jD ( j1) QY 0 jD ( j1)
as in Firpo (2007),^4 and the integrated LQTE (ILQTE)
ILQT E (uD ; u^0 D ) = QY 1 jUD ( j(uD ; u^0 D ]) QY 0 jUD ( j(uD ; u^0 D ]):
The following theorem states that the MQTE can be identiÖed for a range of uD.
Theorem 1 Suppose assumptions (A1)-(A6) hold. If uD is not an isolated point of P x^1 \P x^0 , then M QT E (x; uD ) can be identiÖed for any 2 (0; 1), where Pxd =supp(p(X; Z)jX = x; D = d).
Proof. To simplify notations, we depress the conditioning on X = x. Given the RP assumption (A6), we need only identify QYdjUD ( juD ) whose identiÖcation is equivalent to the identiÖcation of FYdjUD (juD ). We provide two methods to identify FYdjUD (juD ). Method 1: Note that
P (Y yjp(Z) = p; D = 1) p = P (Y 1 yjp(Z) = p; D = 1) P (D = 1jp(Z) = p)
= P (Y 1 yjUD p) p =
Z (^) p
0
FY 1 jUD (yjuD )duD ;
and similarly, P (Y yjp(Z) = p; D = 0) (1 p) =
p
FY 0 jUD (yjuD )duD , so
d [P (Y yjp(Z) = p; D = 1) p] dp =^ FY^1 jUD^ (yjp); d [P (Y yjp(Z) = p; D = 0) (1 p)] dp =^ FY^0 jUD^ (yjp): (^2) Abadie et al. (2002) conáate issues of deÖnition of parameters with issues of identiÖcation; see Section 6.2 below for their deÖnition. Actually, LQT E (x; uD ; u^0 D ) can be deÖned for any uD ; u^0 D 2 (0; 1) although it can only be identiÖed for uD ; uD on the support of p(x; Z). (^3) Note that if the RP assumption holds on X = x, YdjX can be expressed as Yd = q(d; X; U ) by the Skorohod representation, where U jX = U 1 jX = U 0 jX. If the RP assumption holds unconditionally, then Yd can be expressed as Yd = q(d; U ) by the Skorohod representation, where U = U 1 = U 2. This by no means implies that information in X and Z is useless to the identiÖcation or e¢ ciency improvement in the quantile treatment e§ect evaluation. (^4) Be careful about the terminology in the literature. Our IQTE and IQTT are the QTE and QTT of Firpo (2007). Also, the MQTE of Cattaneo (2010) means Q (Y 0 ) and Q (Y 1 ) rather than M QT E (x; uD ), and the MQTE, QTE and QTT in the Örst strand of literature mentioned in the introduction means QY 1 Y 0 jX;UD ( jx; uD ), QY 1 Y 0 jX ( jx) and QY 1 Y 0 jX;D ( jx; 1) rather than M QT E (x; uD ), QT E (x) and QT T (x).
P (DY yjp(Z) = p) does not include the point mass. This intuition is similar in spirit to that of the censored quantile regression models discussed in Powell (1984, 1986). The arguments in Theorem 1 can be applied to the discrete Yd case. Suppose Y 1 and Y 0 have the same support fy 1 ; ; yS g, and then the counterpart of the MQTE is PY 1 jUD (ysjuD ) PY 0 jUD (ysjuD ), s = 1; ; S, where PYdjUD (ysjuD ) is the point mass of Ydj (UD = uD ) at ys. We can still identify FYdjUD (ysjp) by (4), (5), (6) and (7), and then PYdjUD (y 1 jp) = FYdjUD (y 1 jp) and PYdjUD (ysjp) = FYdjUD (ysjp) FYdjUD (ys 1 jp) for s = 2; ; S can be sequentially identiÖed. If Yd can take only 0 and 1, then the parameter of interest is PY 1 jUD (1juD ) PY 0 jUD (1juD ) which coincides with the MTE. Of course, we can also consider the case with mixed discrete and continuous outcomes. Both the discrete case and the mixed case are easier to handle than the continuous case, so we will concentrate on the continuous case in the rest of this paper unless stated otherwise. If we use the idea of LIV as in Heckman and Vytlacil (2001a), we have
P (Y yjp(Z) = p) = P (Y yjp(Z) = p; D = 1) p + P (Y yjp(Z) = p; D = 0) (1 p)
=
Z (^) p
0
FY 1 jUD (yjuD )duD +
p
FY 0 jUD (yjuD )duD ;
and @P (Y yjp(Z) = p) @p =^ FY^1 jUD^ (yjp)^ ^ FY^0 jUD^ (yjp);
which is the di§erence of CDFs in the two treatment statuses. So it is hard to identify the MQTE from @P (Y yjp(Z) = p) =@p. From Theorem 1, we can identify E[Y 1 jUD = p] and E[Y 0 jUD = p] separately, not just their di§erence E[Y 1 Y 0 jUD = p] as in the LIV method of Heckman and Vytlacil (2001a). Method 1 of the proof is a special case of Theorem 1 in Carneiro and Lee (2009). We also discuss Method 2 to distinguish the di§erence between the identiÖcation scheme of the MTE and the MQTE. For
the MTE, E[DY jp(Z) = p] = E [Y jp(Z) = p; D = 1] p =
Z (^) p
0
E [Y 1 jUD = uD ] duD , and E[(1 D) Y jp(Z) =
p] = E [Y jp(Z) = p; D = 0] (1 p) =
p
E [Y 0 jUD = uD ] duD , so the two methods in the proof are the same
in the MTE identiÖcation. We close this subsection by a concrete example. Suppose Y 1 = V +2U; Y 0 = 2V +U , and D = 1(Z V > 0), where (^0)
B@
CA N (0; ) with =
It can be shown that M QT E (uD ) = 0 :5 ^1 (uD ) +
p 0 :75 ^1 ( ). Figure 4 shows M QT E (uD ) for = 0: 1 ; 0 : 25 ; 0 : 5 ; 0 : 75 and 0 : 9. In this simple model, the spreading measure of the MQTE, e.g., M QT E 1 (uD ) M QT E (uD ) for 2 (0; 0 :5), is the same for any uD , which may not be standard in practice. Also, M QT E (uD ) is a decreasing function of p, which indicates that the more likely will an individual par- ticipate in the program, the higher beneÖt will she receive.^5 In the Ögure, we also show M T E^ (uD ), QT E and AT E^ ( E[Y 1 ] E[Y 0 ]) for comparison. Note that in this example, M T E^ (uD ) = M QT E: 5 (uD ), and QT E = 0 = AT E^ does not depend on .^6 Obviously, M QT E (uD ) provides more information than M T E^ (uD ), QT E , and AT E^.
(^5) Aakvik et al. (2005) provide a converse example. (^6) It should be emphasized that QT E is not well deÖned in this example since the RP condition does not hold unconditionally given that Y 1 and Y 0 have the same marginal distribution but Corr(Y 1 ; Y 0 ) = 6: 5 = 7 < 1.
-2.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-1.
-0.
0
1
2
Figure 4: M QT E (uD ) for = 0: 1 ; 0 : 25 ; 0 : 5 ; 0 : 75 and 0 : 9 in a Simple Example
In this subsection, we Örst discuss the relationship between M QT E (x; uD ) and QT T (x), QT E (x), LQT E (x; uD ; u^0 D ), IQT E , IQT T , ILQT E. It turns out that the building block is FYdjX;UD (ydjx; uD ) rather than M QT E (x; uD ). Actually, M QT E (x; uD ) is more relevant to the (conditional) quantile of Y 1 Y 0. From the supplementary materials, we can show that
QT T (x) = F (^) Y 11 jX;D ( jx; 1) F (^) Y 01 jX;D ( jx; 1)
and the quantile treatment e§ect on the untreated (QTUT)
QT U T (x) = F (^) Y 11 jX;D ( jx; 0) F (^) Y 01 jX;D ( jx; 0);
where
FYdjX;D (ydjx; 1) =
0
FYdjX;UD (ydjx; uD )hT T (x; uD )duD ;
FYdjX;D (ydjx; 0) =
0
FYdjX;UD (ydjx; uD )hT U T (x; uD )duD ;
with hT T (x; uD ) =
1 Fp(X;Z)jX (uD jx)
=E[p(X; Z)jX = x] and hT U T (x; uD ) = Fp(X;Z)jX (uD jx)=E[1 p(X; Z)jX = x]. Also,
QT E (x) = F (^) Y 11 jX ( jx) F (^) Y 01 jX ( jx); IQT E = F (^) Y 11 ( ) F (^) Y 01 ( ); IQT T = F (^) Y 11 jD ( j1) F (^) Y 01 jD ( j1); IQT U T = F (^) Y 11 jD ( j0) F (^) Y 01 jD ( j0)
by a similar derivation as in the expression of FYdjD (yj1), so QY 1 Y 0 jD ( j1) can be identiÖed and
P (Y 1 > Y 0 jD = 1) = 1 P (Y 1 Y 0 0 jD = 1)
can also be identiÖed.^8 Actually, we can identify any conditional or unconditional quantile of Y 1 Y 0 of interest, e.g., QY 1 Y 0 jX;D ( jx; d), QY 1 Y 0 jX ( jx), QY 1 Y 0 jD ( jd), QY 1 Y 0 ( ) and QY 1 Y 0 jX;UD ( jx; (u^0 D ; uD ]), based on M QT E (x; uD ). Since the corresponding weights can be similarly deÖned as above, we neglect the details. Note that if only assumption (A6) holds, P (Y 1 > Y 0 jD = 1) need not equal
QT T (x) > 0)d dFXjD (xj1)
or
IQT T > 0)d. They are equal only if the RP assumption holds on X = x; D = 1 or D = 1. This
observation can be used to test whether the RP assumption holds on a larger set than X = x; UD = uD. Because quantile is not a linear operator of the distribution function, QY 1 Y 0 () and QY 1 () QY 0 () are generally unequal (and do not have any identiÖable relationships), so the quantile treatment e§ect and the quantile of the impact distribution are two di§erent parameters. On the contrary, since mean is a linear operator of the distribution function, the average treatment e§ect and the average of the impact distribution are the same parameter. In this paper, we concentrate on three most popular quantile treatment e§ect pa- rameters in the literature: M QT E (x; uD ), QT E (x) and IQT E. We concentrate on di§erence of quantiles rather than quantile of di§erences because the latter may not be interesting. For example, in the common e§ect model, the distribution of Y 1 Y 0 is a point mass at a Öxed value. Even if the treatment e§ect is not common, Y 1 Y 0 may still have discrete components in its distribution. See Section 3.2 of Aakvik et al. (2005) for deÖnitions of the distributional counterparts of the MTE, ATE and ATT based on Y 1 Y 0 when the outcomes are binary, and see Section 2 of Abbring and Heckman (2007) for deÖnitions of the distributional treatment e§ects in more general settings. Finally, we study FY 1 Y 0 jY 0 ;D (jy 0 ; 1). We have already shown in Section 2.1 that under the RP assump- tion (A6),
P (Y 1 Y 0 yjY 0 = y 0 ; X = x; UD = uD ) = 1(QY 1 jX;UD (FY 0 jX;UD (y 0 jx; uD )jx; uD ) y + y 0 );
so
R R h^1 FUD jU 0 ;X (p(x;z)jFY 0 jX (y 0 jx);x)
R (^) p(x;z) 0 P^ (Y^1 ^ Y^0 ^ yjY^0 =^ y^0 ; X^ =^ x; UD^ =^ uD^ )dFUD jU 0 ;X^ (pjFY 0 jX^ (y^0 jx); x)
i
dFZjX (zjx)dFXjY 0 (xjy 0 );
where the equality is from the fact that F(UD ;X;Z)jY 0 =y 0 = FUD jY 0 =y 0 ;X;Z FZjY 0 =y 0 ;X FXjY 0 =y 0 = FUD jU 0 =FY 0 jX (y 0 jX);X FZjX FXjY 0 =y 0 , and U 0 is deÖned in the Skorohod representation of Y 0 , Y 0 jX = F (^) Y 01 jX (U 0 jX). So this parame- ter is a complicated functional of QY 1 jX;UD (FY 0 jX;UD (y 0 jx; uD )jx; uD ) and is not easy to estimate. Actually, it is unknown whether it can be point identiÖed since FUD jU 0 ;X is hard to be nonparametrically identiÖed without further structures on the model.
Although QYdjX;UD ( jx; uD ) can be point identiÖed from Theorem 1, we show in this section that QYdjX ( jx) generally can only be partially identiÖed, which implies that QT E (x) can only be partially identiÖed. Here, we implicitly assume that the RP assumption on X = x holds (i.e., Yd can be represented as Yd = q(d; X; Ud)
(^8) This parameter is useful, e.g., in the median-voter model, we need to check whether P (Y 1 > Y 0 jD = 1)P (D = 1) > 1 = 2.
with U 0 j(X = x) = U 1 j(X = x)), but we do not explicitly explore the information content in this assumption.^9 First, we impose the quantile independence assumption (QIA),
QYdjX;Z ( jX; Z) = QYdjX ( jX) for all 2 (0; 1). (8)
This assumption is equivalent to (Y 1 ; Y 0 )? ZjX. This assumption is parallel to the usual IV assumption E[YdjX; Z] = E[YdjX] in the average treatment e§ect evaluation. As in Heckman and Vytlacil (2001b), we assume further that D = 1(p(X; Z) UD ) and Z? UD jX (9)
to study the improvement on the bounds for QT E (x).
From Proposition 2 and (36) of Manski (1994), we have sharp bounds for QYdjX ( jx) under (8):
sup z2Zx
L^1 (x; z) QY 1 jX ( jx) inf z2Zx R^1 (x; z) ;
sup z2Zx
L^0 (x; z) QY 0 jX ( jx) (^) zinf2Z x R^0 (x; z) ;
where Zx supp(ZjX = x),
L^1 (x; z) =
QY jX;Z;D
1 (^) p^1 ( x;z) x; z; 1
if p(x; z) > 1 ; otherwise,
R^1 (x; z) =
QY jX;Z;D
p(x;z) jx; z;^1
if p(x; z) ; otherwise,
L^0 (x; z) =
QY jX;Z;D
1 (^1) ^1 p (x;z ) x; z; 0
if p(x; z) < ; otherwise,
R^0 (x; z) =
QY jX;Z;D
1 p(x;z) x; z;^0
if p(x; z) 1 ; otherwise.
So IL (x) sup z2Zx
L^1 (x; z) (^) zinf2Z x R^0 (x; z) QT E (x) (^) zinf2Z x R^1 (x; z) sup z2Zx
L^0 (x; z) IU (x): (11)
This bound is trivial, since IL (x) = 1 and IU (x) = 1 if Y 1 and Y 0 are unbounded. Similar phenomena also happen in the average treatment e§ect evaluation. To avoid such trivial results, we assume that
P
yld(x) Yd yud (x)jX = x; Z
where yld(x); ydu (x) 2 R does not depend on Z from (8). To simplify notations, we assume that y 0 l(x) = y 1 l(x), denoted as yl(x), and y 0 u (x) = yu 1 (x), denoted as yu(x). Then 1 in (10) is changed to yl(x) and 1 is changed to yu(x). Let Px supp(p(X; Z)jX = x), psup x = sup Px and pinf x = inf Px. The width of the bounds is IU (x) IL (x), a complicated expression to evaluate, especially if Zx is uncountable. Note that the above bounds exactly identify QT E (x) if IL (x) = IU (x). Note also that it is neither necessary nor su¢ cient for p(x; z) (^9) In Section 6.1, we will show how Chernozhukov and Hansen (2005) point identify QYdjX ( jx) by exploring the information content in this assumption and imposing some completeness conditions.
where
IL(x) = sup z2Zx
p(x; z)E [Y 1 jX = x; Z = z; D = 1] + (1 p(x; z))yl(x; z)