Lower Probabilities and Posterior Summaries in Bayesian Inference, Study notes of Statistics

The concept of lower probabilities in Bayesian inference and how they can be used to summarize the posterior distribution. The document also covers the relationship between lower and upper probabilities and the regularity conditions required for their convergence to frequentist coverage probabilities. The document further explores the use of moment equalities and the likelihood function for the current moment equality model, as well as the concept of unrevisable prior knowledge and its relation to posterior distributions.

Typology: Study notes

2021/2022

Uploaded on 08/01/2022

hal_s95
hal_s95 🇵🇭

4.4

(655)

10K documents

1 / 47

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Estimation and Inference for Set-identi…ed Parameters
Using Posterior Lower Probability
Toru Kitagawa
Department of Economics, University College London
28, July, 2012
Abstract
In inference for set-identi…ed parameters, Bayesian probability statements about
unknown parameters do not coincide, even asymptotically, with frequentist’s con…dence
statements. This paper aims to smooth out this disagreement from a robust Bayes
perspective. I show that a class of prior distributions exists, with which the posterior
inference statements drawn via the lower envelope (lower probability) of the class of
posterior distributions asymptotically agrees with frequentist con…dence statements for
the identi…ed set. With this class of priors, the statistical decision problems, including
the point and set estimation of the set-identi…ed parameters, are analyzed under the
posterior gamma-minimax criterion.
Keywords: Partial Identi…cation, Bayesian Robustness, Belief Function, Imprecise Prob-
ability, Gamma-minimax, Random Set.
Email: [email protected]. I thank Gary Chamberlain, Andrew Chesher, Siddhartha Chib, Larry
Epstein, Jean-Pierre Florens, Guido Imbens, Hiro Kaido, Charles Manski, Ulrich Müller, Andriy Norets,
Adam Rosen, Kevin Song, and Elie Tamer for their valuable discussions and comments. I also thank the
seminar participants at Academia Sinica, Brown, Cornell, Cowles Conference 2011, EC22010, Harvard/MIT,
Midwest Econometrics 2010, Northwestern, NYU, RES Conference 2011, Seoul National University, Simon
Fraser University, and UBC for their helpful comments. All remaining errors are mine. Financial support
from the ESRC through the ESRC Centre for Microdata Methods and Practice (CeMMAP) (grant number
RES-589-28-0001) is gratefully acknowledged.
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f

Partial preview of the text

Download Lower Probabilities and Posterior Summaries in Bayesian Inference and more Study notes Statistics in PDF only on Docsity!

Estimation and Inference for Set-identiÖed Parameters

Using Posterior Lower Probability

Toru Kitagawa

Department of Economics, University College London

28, July, 2012

Abstract In inference for set-identiÖed parameters, Bayesian probability statements about unknown parameters do not coincide, even asymptotically, with frequentistís conÖdence statements. This paper aims to smooth out this disagreement from a robust Bayes perspective. I show that a class of prior distributions exists, with which the posterior inference statements drawn via the lower envelope (lower probability) of the class of posterior distributions asymptotically agrees with frequentist conÖdence statements for the identiÖed set. With this class of priors, the statistical decision problems, including the point and set estimation of the set-identiÖed parameters, are analyzed under the posterior gamma-minimax criterion. Keywords: Partial IdentiÖcation, Bayesian Robustness, Belief Function, Imprecise Prob- ability, Gamma-minimax, Random Set. Email: [email protected]. I thank Gary Chamberlain, Andrew Chesher, Siddhartha Chib, Larry Epstein, Jean-Pierre Florens, Guido Imbens, Hiro Kaido, Charles Manski, Ulrich M¸ller, Andriy Norets, Adam Rosen, Kevin Song, and Elie Tamer for their valuable discussions and comments. I also thank the seminar participants at Academia Sinica, Brown, Cornell, Cowles Conference 2011, EC^2 2010, Harvard/MIT, Midwest Econometrics 2010, Northwestern, NYU, RES Conference 2011, Seoul National University, Simon Fraser University, and UBC for their helpful comments. All remaining errors are mine. Financial support from the ESRC through the ESRC Centre for Microdata Methods and Practice (CeMMAP) (grant number RES-589-28-0001) is gratefully acknowledged.

1 Introduction

In inferring identiÖed parameters in a parametric setup, the Bayesian probability statements about unknown parameters are found to be similar, at least asymptotically, to the frequentist conÖdence statements about the true value of the parameters. In partial identiÖcation analyses initiated by Manski (1989, 1990, 2003, 2007), such asymptotic harmony between the two inference paradigms breaks down (Moon and Schorfheide (2011)). The Bayesian interval estimates for the set-identiÖed parameter are shorter, even asymptotically, than the frequentist ones, and they asymptotically lie inside the frequentist conÖdence intervals. Frequentists might interpret this phenomenon, Bayesian over-conÖdence in their inferential statements, as being Öctitious. Bayesians, on the other hand, might consider that the frequentist conÖdence statements, which apparently lack posterior probability interpretation, raise some interpretative di¢ culty once data are observed. The primary aim of this paper is to smooth out the disagreement between the two schools of statistical inference by applying the perspective of a robust Bayes inference, where one can incorporate partial prior knowledge into posterior inference. While there is a variety of robust Bayes approaches, this paper focuses on a multiple prior Bayes analysis, where the partial prior knowledge, or the robustness concern against prior misspeciÖcation, is modeled with a class of priors (ambiguous belief). The Bayes rule is applied to each prior to form a class of posteriors. The posterior inference procedures considered in this paper operate on the class of posteriors by focusing on their lower and upper envelopes, the so-called posterior lower and upper probabilities. When the parameters are not identiÖed, the prior distribution of the model parameters can be decomposed into two components: one that can be updated by data (revisable prior knowledge) and one that can never be updated by data (unrevisable prior knowledge). Given that the ultimate goal of the partially identiÖcation analysis is to establish a "domain of consensus" (Manski (2007)) among the set of assumptions that data are silent about, a natural way to incorporate this agenda into the robust Bayes framework is to design a prior class in such a way that it shares a single prior distribution for the revisable prior knowledge, but allows for arbitrary prior distributions for the unrevisable prior knowledge. Using this prior class as a prior input, this paper derives the posterior lower probability and investigates

identiÖed model of entry games with multiple equilibria, and provide an axiomatic argument that justiÖes a single-prior Bayesian inference for a set-identiÖed parameter. The current paper does not intend to provide any normative argument as to whether one should proceed with a single prior or multiple priors in inferring non-identiÖed parameters. The analysis of lower and upper probabilities originates with Dempster (1966, 1967a, 1967b, 1968), in his Öducial argument of drawing posterior inferences without specifying a prior distribution. The ináuence of Dempsterís appears in the belief function analysis of Shafer (1976, 1982) and the imprecise probability analysis of Walley (1991). In the context of robust Bayes analysis, the lower and upper probabilities have been playing important roles in measuring the global sensitivity of the posterior (Berger (1984), Berger and Berliner (1986)) and also in characterizing a class of priors/posteriors (DeRobertis and Hartigan (1981), Wasserman (1989, 1990), and Wasserman and Kadane (1990)). In econometrics, pioneering work using multiple priors was carried out by Chamberlain and Leamer (1976), and Leamer (1982), who obtained the bounds for the posterior mean of the regression coe¢ cients when a prior varies over a certain class. All of these previous studies did not explicitly consider non-identiÖed models. This paper, in contrast, focuses on non-identiÖed models, and aims to clarify a link between the early idea of the lower and upper probabilities and a recent issue on inferences in set-identiÖed models. The posterior lower probability to be obtained in this paper is an inÖnite-order monotone capacity, or equivalently, a containment functional in the random set theory. Beresteanu and Molinari (2008) and Beresteanu, Molchanov, and Molinari (2012) show the usefulness and wide applicability of the random set theory to a class of partially identiÖed models by viewing observations as random sets, and the estimand (identiÖed set) as its Aumann expectation. They propose an asymptotically valid frequentist inference procedure for the identiÖed set by employing the central limit theorem applicable to the properly deÖned sum of random sets. Galichon and Henry (2006, 2009) and Beresteanu, Molchanov, and Molinari (2011) propose a use of inÖnite-order capacity in deÖning and inferring the identiÖed set in the structural econometric model with multiple equilibria. The robust Bayes analysis of this paper closely relates to the literature of non-additive measures and random sets, but the way that these theories enter to the analysis di§ers from these previous works in the following ways. First, the class of models to be considered is assumed to have well-

deÖned likelihood functions, and the lack of identiÖcation is modeled in terms of the "data- independent áat regions" of the likelihood. Ambiguity is not explicitly modeled at the level of observations, but instead ambiguity for the parameters is introduced through the absence of prior knowledge on each áat region of the likelihood. Second, I obtain the identiÖed set as random sets, whose probability law is represented by the posterior lower probability. Here, the source of probability that induces the random identiÖed set is the posterior uncertainty for the identiÖable parameters, not the sampling probability of the observations. Third, the inferential statements to be proposed in the paper are made conditional on data, and they do not invoke any large-sample approximations. The decision theoretic analysis in this paper employs the posterior gamma-minimax cri- terion, which leads to a decision that minimizes the worst case posterior risk over the class of posteriors. The gamma-minimax decision analysis often becomes challenging, both ana- lytically and numerically, and the existing analyses are limited to rather simple parametric models with a certain choice of prior class (Betro and Ruggeri (1992), Chamberlain (2000), and Vidakovic (2000)). The speciÖed prior class, in contrast, o§ers a general and feasible way to solve the posterior gamma-minimax decision problem, provided that the identiÖed set for the parameter of interest can be computed for each of the identiÖed parameter values. In a recent study by Song (2012), point estimation for an interval-identiÖed parameter from the local asymptotic minimax approach is considered.

1.2 Plan of the Paper

The rest of the paper is organized as follows. In Section 2, the main results of this paper are presented using a simple example of missing data. Section 3 introduces the general frame- work, where I construct a class of prior distributions that can contain arbitrary unrevisable prior knowledge. I then derive the posterior lower and upper probabilities. Statistical deci- sion analyses with multiple priors are examined in Section 4. In Section 5, how to construct the posterior credible regions based on the posterior lower probability is discussed and their large-sample behaviors are examined in an interval-identiÖed parameter case. Proofs and lemmas are provided in Appendix A.

robust Bayes procedure considered in this paper aims to make the posterior inference free from such sensitivity concerns by introducing multiple priors for . The way to construct a prior class is as follows. I Örst specify a single prior  for the identiÖed parameters . In view of , prior  speciÖes how much prior belief should be assigned to each áat region of the ís likelihood (), whereas, depending on ways to allocate the assigned belief over  2 () (for each ), the implied prior for  may di§er. Therefore, by collecting all the possible ways of allocate the assigned belief over f 2 ()g for each , I can construct the following class of prior distributions of , M ^ ^ =  :  ( (B)) =  (B) for all B   , where  denotes a prior distribution for . By applying the Bayes rule to each  2 M

and marginalizing each posterior of  for , I obtain the class of posteriors of ,

FjX :  2 M

. I now summarize the class of posteriors of  by its lower envelope (lower probability), FjX (D) = inf 2M() FjX (D), which maps subset D in the parameter space of  to [0; 1]. In words, the posterior lower probability evaluated at D says that the posterior belief allocated for f 2 Dg is at least FjX (D), no matter which  2 M

is used. The main theorem of this paper shows that the posterior lower probability satisÖes

FjX (D) = FjX (f : H ()  Dg) ,

where FjX denotes the posterior distribution of  implied from the prior . The key insight of this equality is that, with prior class M

, drawing inference for  based on its posterior lower probability is done by analyzing the probability law of random sets H (),   FjX. Leaving their formal analysis to the later sections of this paper, I now outline the imple- mentation of the posterior lower probability inference for  proposed in this paper.

  1. Specify a prior for  and update it by the Bayes rule. When a credible prior for  is not available, a reasonably "non-informative" prior may be used as far as the posterior of  is proper.^1
  2. Let fs : s = 1; : : : ; Sg be random draws of  from the posterior of . The mean and median of the posterior lower probability of  can be deÖned via the gamma-minimax (^1) See Kass and Wasserman (1996) for a survey of ìreasonablyî non-informative priors.

decision criterion, and they can be approximated by

arg min a S^1

X^ S

s=

sup  2 H(s) (a )^2 and arg min a S^1

X^ S

s=

sup  2 H(s) ja j ;

respectively.

  1. The posterior lower credible region of  at credibility level 1 , which can be inter- preted as a (1 )- level set of the posterior lower probability of , is deÖned by the smallest interval that contains H () with posterior probability 1 (Proposition 5. in this paper proposes an algorithm to compute the posterior lower credible region for interval-identiÖed cases). Under certain regularity conditions that are satisÖed in the current missing data example, the posterior lower credible region of  asymptotically attains the frequentist coverage probability 1 for the true identiÖed set H ( 0 ). where  0 is the value of  corresponding to the sampling distribution of data.

3 Multiple-prior Analysis and the Lower and Upper

Probabilities

3.1 Likelihood and Set IdentiÖcation: The General Framework

Let (X; X ) and (; A) be measurable spaces of a sample X 2 X and a parameter vector  2 , respectively. The analytical framework of this paper covers both a parametric model  = Rd, d < 1 , and a non-parametric model where  is a separable Banach space. The sample size is implicit in the notation. Let  be a marginal probability distribution on the parameter space (; A), referred to as a prior distribution for . Assume that the conditional distribution of X given  exists and has the probability density p(xj) at every  2  with respect to a -Önite measure on (X; X ). The parameter vector  may consist of parameters that determine the behaviors of the economic agents, as well as those that characterize the distribution of the unobserved het- erogeneities in the population. In the context of the missing data or counterfactual causal models,  indexes the distribution of the underlying population outcomes or the potential outcomes. In all of these cases, the parameter  should be distinguished from the parameters

where () and (^0 ) for  6 = ^0 are disjoint, and f() ;  2 g constitutes a partition of . I assume g() = ; so () is non-empty for every  2 .^3 In the set-identiÖed model, the parameter of interest  2 H is a subvector or a trans- formation of  denoted by  = h(), h : (; A)! (H; D). The formal deÖnition of the identiÖed set of  is given as follows.

DeÖnition 3.1 (IdentiÖed Set of ) (i) The identiÖed set of  is a set-valued map H :   H deÖned by the projection of () onto H through h(), H()  fh() :  2 ()g :

(ii) The parameter  = h() is point-identiÖed at  if H() is a singleton, and  is set- identiÖed at  if H () is not a singleton.

Note that the identiÖcation of  is deÖned in the pre-posterior sense because it is based on the likelihood evaluated at every possible realization of a sample, not only for the observed one.

3.2 Examples

I now provide some examples, in addition to the illustrating example of Section 2, both to illustrate the above concepts and notations, and to provide a concrete focus for the later development.

Example 3.1 (Bounding ATE by Linear Programming) Consider the treatment ef- fect model with incompliance and a binary instrument Z 2 f 1 ; 0 g, as considered in Imbens and Angrist (1994), and Angrist, Imbens, and Rubin (1996). Assume that the treatment status and the outcome of interest are both binary. Let (W 1 ; W 0 ) 2 f 1 ; 0 g^2 be the poten- tial treatment status in response to the instrument, and W = ZW 1 + (1 Z)W 0 be the observed treatment status. (Y 1 ; Y 0 ) 2 f 1 ; 0 g^2 is a pair of treated and control outcomes and (^3) In an observationally restrictive model, in the sense of Koopman and Reiersol (1950), p^(xj) likelihood function for the su¢ cient parameters, is well deÖned for a domain larger than g() (see Example 3.1 in Section 3.2). In this case, the model possesses the falisiÖability property, and () can be empty for some  2 .

Y = W Y 1 + (1 W )Y 0 is the observed outcome. Data is a random sample of (Yi; Wi; Zi). Following Imbens and Angrist (1994), consider partitioning the population into four subpop- ulations deÖned in terms of the potential treatment-selection responses:

Ti =

c if W 1 i = 1 and W 0 i = 0 : complier, at if W 1 i = W 0 i = 1 : always-taker, nt if W 1 i = W 0 i = 0 : never-taker, d if W 1 i = 0 and W 0 i = 1 : deÖer,

where Ti is the indicator for the types of selection responses. Assume a randomized instrument, Z? (Y 1 ; Y 0 ; W 1 ; W 0 ). Then, the distribution of observables and the distribution of potential outcomes satisfy the following equalities for y 2 f 1 ; 0 g:

Pr(Y = y; W = 1jZ = 1) = Pr(Y 1 = y; T = c) + Pr(Y 1 = y; T = at); (3.2) Pr(Y = y; W = 1jZ = 0) = Pr(Y 1 = y; T = d) + Pr(Y 1 = y; T = at); Pr(Y = y; W = 0jZ = 1) = Pr(Y 0 = y; T = d) + Pr(Y 1 = y; T = nt); Pr(Y = y; W = 0jZ = 0) = Pr(Y 0 = y; T = c) + Pr(Y 1 = y; T = nt):

Ignoring the marginal distribution of Z, a full parameter vector of the model can be speciÖed by a joint distribution of (Y 1 ; Y 0 ; T ):

 = (Pr(Y 1 = y; Y 0 = y^0 ; T = t) : y = 1; 0 ; y^0 = 1; 0 ; t = c; nt; at; d) 2 ;

where  is the 16-dimensional probability simplex. Let ATE be the parameter of interest.

  E(Y 1 Y 0 ) =

X

t=c;nt;at;d

[Pr(Y 1 = 1; T = t) Pr(Y 0 = 1; T = t)]

=

X

t=c;nt;at;d

X

y=1; 0

[Pr(Y 1 = 1; Y 0 = y; T = t) Pr(Y 1 = y; Y 0 = 1; T = t)]  h():

The likelihood conditional on Z depends on  only through the distribution of (Y; W ) given Z, so the su¢ cient parameter vector consists of eight probability masses:

 = (Pr(Y = y; W = wjZ = z) : y = 1; 0 ; d = 1; 0 ; z = 1; 0) :

where

wi() = exp^ f^ (g())

(^0) (m(xi) g())g Pn i=1 exp^ f^ (g())^0 (m(xi)^ ^ g())g

(g()) = arg min 2 RJ

( (^) Xn

i=

exp f 0 (m(xi) g())g

Thus, the parameter  = (; ) enters the likelihood only through g() = A + . Conse- quently, I take  = g() to be the su¢ cient parameters. The identiÖed set for  is given by

() =

(; ) 2 H  [0; 1 )L^ : A +  =  :

The coordinate projection of () onto H yields H(), the identiÖed set for  (Bertsimas and Tsitsiklis (1997, Chap.2) for an algorithm for projecting a polyhedron).

3.3 Unrevisable Prior Knowledge and a Class of Priors

Let  be a prior of  and  be the marginal probability measure on the su¢ cient parameter space (; B) induced by  and g():

(B) = ((B)) for all B 2 B.

Let x 2 X be sampled data. The posterior distribution of , denoted by FjX (), is obtained as

FjX (A) =

Z



j(Aj)dFjX (); A 2 A, (3.4)

where j(Aj) denotes the conditional distribution of  given , and FjX () is the posterior distribution of . The posterior distribution of  given in (3.4) shows that the prior distribution for  marginalized to  can be updated by data, while the conditional prior of  given  is never be updated by the data because the likelihood is áat on ()   for any realizations of the sample. In this sense, the prior information marginalized to the su¢ cient parameter  can be interpreted as the revisable prior knowledge, and the conditional priors of  given ,  j (j) :  2  can be interpreted as the unrevisable prior knowledge. If one wants to

summarize the posterior uncertainty of  in the form of a probability distribution on (; A), as recommended in the Bayesian paradigm, he needs to have a single prior distribution of , which necessarily induces unique unrevisable prior knowledge j. If he could justify his choice of j by any credible prior information, the standard Bayesian updating (3.4) would yield a valid posterior distribution of : A challenging situation would arise if one is short of a credible prior distribution of . In this case, the researcher, who is aware that j will never be updated by data, might feel anxious in implementing the Bayesian inference procedure, because an unconÖdently speciÖed j can have a signiÖcant ináuence to the subsequent posterior inference. The robust Bayes analysis in this paper speciÖcally focuses on such a situation, and introduce ambiguity for the conditional prior

j (j) :  2  in the form of multiple priors. SpeciÖcally, given  a prior on (; B) speciÖed by the researcher, consider the class of prior distributions of  deÖned by:

M() =

 : ((B)) = (B) for every B 2 B :

M() consists of prior distributions of  whose marginal distribution for the su¢ cient parameters coincides with the prespeciÖed .^5 This paper proposes to use M() as a prior input for the posterior analysis, meaning that, with accepting to specify a single prior distribution for the su¢ cient parameters , I leave the conditional priors j unspeciÖed and allow for arbitrary ones as long as () =

R

 j(j)d^ yields a probability measure on (; A).^6 In the subsequent analysis, I shall not discuss how to select , and shall treat  as given. The ináuence of  on the posterior of  will diminish as the sample size increases, so the sensitivity issue of the posterior of  is expected to be less severe when the sample size is moderate or large. (^5) I thank Jean-Pierre Florens for suggesting this representation of the prior class. (^6) Su¢ cient parameters  are deÖned by examining the entire model fp (xj) : x 2 X ;  2 g, so that the prior class M() is, by construction, model dependent. This distinguishes the current approach from the standard robust Bayes analysis where a prior class represents the researcherís subjective assessment of his imprecise prior knowledge (Berger (1985)).

(i) For each A 2 A,

FjX(A) = FjX (f : ()  Ag); (3.5) F (^) jX (A) = FjX (f : () \ A 6 = ;g) ; (3.6)

where FjX (B); B 2 B, is the posterior probability measure of .

(ii) DeÖne the posterior lower and upper probabilities of  = h () by

FjX(D)  (^)  inf  2M() FjX (h^1 (D)); F (^) jX (D)  sup  2M()

FjX (h^1 (D)); for D 2 D.

It holds

FjX(D) = FjX (f : H()  Dg); F (^) jX (D) = FjX (f : H() \ D 6 = ;g):

Proof. For a proof of (i), see Appendix A. For a proof of (ii), see equation (3.7).

The expression for FjX(A) implies that the posterior lower probability on A calculates the probability that the set () is contained in subset A in terms of the posterior probability law of . On the other hand, the upper probability is interpreted as the posterior probability that the set () hits subset A. The second statement of the theorem provides a procedure for marginalizing the lower and upper probabilities of  into those of the parameter of interest . The expressions of FjX(D) and F (^) jX (D) are simple and easy to interpret: the lower and upper probabilities of  = h() are the containment and hitting probabilities of the random sets obtained by projecting () through h(). This marginalization rule of the lower probability follows from

FjX(D) = FjX(h^1 (D)) = FjX (

 : ()  h^1 (D) ) = FjX (f : H()  Dg). (3.7)

Note that, In the standard Bayesian inference, marginalization of the posterior of  to  is conducted by integrating the posterior probability measure of  for , while in the lower probability inference, marginalization for  corresponds to projecting random sets () via  = h (). This stark contrast between the standard Bayes and the multiple prior robust Bayes inference highlights how the introduction of ambiguity changes the way of eliminating the nuisance parameters in the posterior inference.

As is known in the literature (e.g., Huber (1973)), the lower probability of a set of proba- bility measures is a monotone nonadditive measure (capacity). Furthermore, in the current speciÖcation of the prior class, the representation of the lower probability obtained in Theo- rem 3.1 implies that the resulting posterior lower and upper probabilities are supermodular and submodular, respectively.

Corollary 3.1 Assume Condition 3.1. The posterior lower and upper probabilities of  are supermodular and submodular, respectively. For A 1 , A 2 2 A subsets in ,

FjX(A 1 [ A 2 ) + FjX(A 1 \ A 2 )  FjX(A 1 ) + FjX(A 2 ); F (^) jX (A 1 [ A 2 ) + F (^) jX (A 1 \ A 2 )  F (^) jX (A 1 ) + F (^) jX (A 2 ):

Also, the posterior lower and upper probabilities of  are supermodular and submodular, respectively. For D 1 , D 2 2 D subsets in H,

FjX(D 1 [ D 2 ) + FjX(D 1 \ D 2 )  FjX(D 1 ) + FjX(D 2 ); F (^) jX (D 1 [ D 2 ) + F (^) jX (D 1 \ D 2 )  F (^) jX (D 1 ) + F (^) jX (D 2 ):

The results of Theorem 3.1 (i) can be seen as a special case of Wassermanís (1990) general construction of the posterior lower and upper probabilities. Whereas, one notable di§erence from Wassermanís analysis is that, with prior class M ^ , the lower probability of the posterior class becomes an 1 -order monotone capacity (a containment functional of random

where the Örst argument in the upper posterior risk represents the dependence of the prior class on a prior for .

DeÖnition 4.1 A posterior gamma-minimax action a x with respect to prior class M ^  is an action that minimizes the upper posterior risk, i.e.,

(; a x) = (^) ainf2Ha (; a) = (^) ainf2Ha sup  2M()

(; a).

The gamma-minimax decision approach involves a favor for a conservative action that guards against the least favorable prior within the class, and it can be seen as a compromise of the Bayesian decision principle and the minimax decision principle. The next proposition shows that the upper posterior risk (; a) equals the Choquet expected loss with respect to the posterior upper probability.

Proposition 4.1 Under Condition 3.1, the upper posterior risk satisÖes

(; a) =

Z

L(; a)dF (^) jX () =

Z



sup  2 H() L(; a)dFjX (), (4.2)

whenever

R

L(; a)dF (^) jX () < 1 , where

R

L(; a)dF (^) jX () is the Choquet integral.

Proof. See Appendix A.

The third expression in (4.2) shows that the posterior gamma-minimax criterion is written as the expectation of the worst-case loss function, sup 2 H() L(; a), with respect to the posterior of . The supremum part stems from the ambiguity of : given , what the researcher knows about  is only that it lies within the identiÖed set H (), and, following the minimax principle, he forms the loss by supposing that the nature chooses the worst case in response to his/her action a. On the other hand, the expectation in  represents the posterior uncertainty of the identiÖed set H (): with the Önite number of observations, the identiÖed set of  is known with some uncertainty as summarized by the posterior of . The posterior gamma-minimax criterion combines such ambiguity of  with the posterior uncertainty of the identiÖed set H () to yield a single objective function to be minimized.^10 (^10) The posterior gamma minimax action a x can be interpreted as a Bayes action for some posterior dis- tributions in the class. For instance, in case of the quadratic loss, the saddle-point argument implies that the gamma-minimax action a x corresponds to the mean of a posterior distribution (Bayes action) that has maximal posterior variance in the class.

Although a closed-form expression of a x is not, in general, available, this proposition suggests a simple numerical algorithm for approximating a x using a random sample of  from its posterior FjX. Let fsgSs=1 be S random draws of  from posterior FjX. Then, a x can be approximated by

^a x  arg min a2Ha S^1

X^ S

s=

sup  2 H(s) L(; a).

The gamma minimax decisions are usually dynamically inconsistent; a posteriori optimal gamma-minimax action does not coincide with an unconditional optimal gamma-minimax decision. This is also the case with out prior class, and this will imply that a x fails to be a Bayes decision with respect to any single prior in the class M

. See Appendix B for an example and further discussion. As an alternative to the posterior gamma-minimax action, the gamma-minimax regret criterion may be considered (Berger (1985, p. 218), and Rios Insua, Ruggeri, and Vidakovic (1995)). Appendix B provides some analytical results of the posterior gamma-minimax regret analysis where the parameter of interest  is a scalar and the loss function is quadratic, L(; a) = ( a)^2. There, it is shown that the posterior gamma-minimax regret decision can di§er from the posterior gamma-minimax decision derived above, but that they converge to the same limit asymptotically.

5 Set Estimation of 

In the standard Bayesian inference, set estimation is often conducted by reporting the contour sets of the posterior probability density of  (highest posterior density region). If the posterior information for  is summarized by the lower and upper probabilities, how should we conduct set estimation of ?

5.1 Posterior Lower Credible Region

For 2 (0; 1), consider a subset C 1  H such that the posterior lower probability FjX(C 1 ) is greater than or equal to 1 :

FjX(C 1 ) = FjX (H()  C 1 ))  1 . (5.1)