




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Discrete response models for more than two outcomes, focusing on multinomial, conditional and nested logit models. The models are used to analyze the distribution of non-negative integer valued choices, such as travel modes or employment status, in terms of covariates. How to develop a model for the conditional probability of choice j given the covariates and provides a link with utility maximization.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Econ 513, USC, Fall 2005
Lecture 15. Discrete Response Models: Multinomial, Conditional and Nested Logit Models
Here we focus again on models for discrete choice with more than two outcomes. We assume that the outcome of interest, the choice y takes on non-negative integer values between zero and J; y ∈ { 0 , 1 ,... , J}. Unlike the ordered case there is no particular meaning to the ordering. Examples are travel modes (bus/train/car), employment status (employed/unemployed/out-of-the-laborforce), marital status (single/married/divorced/widowed) and many others.
We wish to model the distribution of y in terms of covariates. In some cases we will distinguish between covariates xi that vary by units (individuals or firms), and covariates that vary by choice (and possibly individual), xij. Examples of the first type include individual characteristics such as age, or education. An example of the second type is the cost associated with the choice, for example the cost of commuting by bus/train/car. This distinction only arises from the economics (or general scientific) substance of the problem. McFadden developed the interpretation of these models through utility maximizing choice behavior. In that case we may be willing to put restrictions on the way covariates affect choices: costs of a particular choice affect the utility of that choice, but not the utilities of other choices.
The strategy is to develop a model for the conditional probability of choice j given the covariates. Suppose the model is Pr(y = j|x) = Pj (x; θ). Then the log likelihood function is
L(θ) =
i=
j=
1 {yi = j} · ln Pj (xi; θ).
I. Multinomial Logit
Suppose we only have individual specific covariates. Then we can model the response probability as
Pr(y = j|x) =
exp(x′βj ) 1 +
l=1 exp(x ′βl)
for choices j = 1,... , J and
Pr(y = 0|x) =
l=1 exp(x ′βl),
for the first choice. This is a direct extension of the binary response logit model. It leads to a very well-behaved likelihood function and is easy to estimate. More interestingly it can be viewed as a special case of the following conditional logit.
II. Conditional Logit
Suppose all covariates vary by choice (and possibly also by individual, but that is not essential here). Then McFadden proposed the conditional logit model:
Pr(yi = j|xi 0 ,... , xiJ ) =
exp(x′ ij β) ∑J l=0 exp(x
′ ilβ)
for j = 0,... , J.
The multinomial logit model can be viewed as a special case of this. Suppose we have a vector of individual characteristics xi with dimension K. Then define for each choice j the vector of covariates xij as the vector of dimension K × (J + 1), with all zeros other than the elements K × j + 1 to K × (j + 1) which are equal to xi:
xi 0 =
x 1 0 .. . .. . 0
,... xij =
xi .. . 0
,... xiJ =
xi
III. Link with Utility Maximization
McFadden motivates this model by extending the latent index model to multiple choices. Suppose that the utility for individual i associated with choice j is
Uij = x′ ij β + εij. (1)
Furthermore, let individual i choose option j (that is yi = j) if that provides the highest level of utility, or
yi = j if Uij ≥ Uil for all l = 0,... , J,
(ties have probability zero because of the continuity of the distribution for ε).
Now suppose that the εij are independent accross choices and individuals and have type I extreme value distributions. Then the choice yi follows the conditional logit model. The type I extreme value distribution has cumulative distribution function
F () = exp(− exp(−)),
and probability density function
f () = exp(−) · exp(− exp(−)).
= exp(c) ·
−∞
exp(−η) · exp(− exp(−η))dη = exp(c),
by change of variables, which we apply with
c = − ln (1 + exp(x′ i 1 β − x′ i 0 β) +... + exp(x′ iJ β − x′ i 0 β)).
IV. Independence of Irrelevant Alternatives
The main problem with the conditional logit is the property of independence of irrelevant alternative (IIA). Consider the conditional probability of choosing j given that you choose either j or l, Pr(y = j|y ∈ {j, l}):
Pr(yi = j|yi ∈ {j, l}) =
exp(x′ ij β) exp(x′ ij β) + exp(x′ ilβ)
This probability does not depend on the characteristics of alternatives other than j and l. This is sometimes unattractive. McFadden’s famous blue bus/red bus example illustrates this. Suppose there are three choices: commuting by car, by red bus or by blue bus. A sensible model would be to think that people have a preference over cars versus buses, but are indifferent between red versus blue buses. That would imply that the conditional probability of commuting by car given that one commutes by car or red bus would probably differ from the same conditional probability if there is no blue bus. Presumably taking away the blue bus choice would lead all the current blue bus users to shift to the red bus, and not to cars.
The solution is to allow in some fashion for correlation between the errors in the latent utility representation (1). With choice set that contains multiple versions of essentially the same option, we should allow the latent utilities for these choices to be identical, and so the error terms would have to be perfectly correlated. This can be done in a number of ways. We analyze the first one in the following discussion.
III. Nested Logit
One way to induce correlation between the choices is through nesting them. Suppose the set of choices { 0 , 1 ,... , J} can be partitioned into S sets B 1 ,... , BS , so that
{ 0 , 1 ,... , J} = ∪Ss=1Bs.
Let Zs be set specific variables. (It may be that the set of set specific variables is just a vector of indicators, with Zs an S-vector of zeros with a one for the sth element.) Now let the conditional probability of choice j given that yi ∈ Bs be equal to
Pr(yi = j|xi, yi ∈ Bs) =
exp(σ s− 1 x′ ij β) ∑ l∈Bs exp(σ − 1 s x ′ ilβ)
In addition suppose the probability of set Bs is
Pr(yi ∈ Bs|xi) =
exp(Z s′α)
l∈Bs exp(σ
− 1 s x ′ ilβ)
)σs ∑S t=1 exp(Z
′ tα)^
l∈Bt exp(σ
− 1 t x ′ ilβ)
)σs.
If we fix σs = 1 for all s, then
Pr(yi = j|xi) =
exp(x′ ij β + Z′ sα) ∑S t=
l∈Bt exp(x
′ ilβ^ +^ Ztα)
and we are back in the conditional logit model. The extra coefficient σs implicitly allows for correlation of the errors in (1). The joint distribution function of the εij is
F (εi 0 ,... , εiJ ) = exp
s=
exp(Z s′α)
j∈Bs
exp
−σ s− 1 εij
)σs ) .
Within the sets the ’correlation coefficient’ for the εij is equal to 1 − σ. Between the sets the εij are independent.
How do you estimate these models? One approach is to construct the log likelihood and directly maximize it. That is complicated, especially since the log likelihood function is not concave, but it is not impossible. An easier alternative is to directly use the nesting structure. Within a nest we have a conditional logit model with coefficients β/σs. Hence we can directly estimate β/σs using the concavity of the conditional logit model. Denote
these estimates of β/σs by β/σ̂ s. Then the probability of a particular set Bs can be used to estimate σs and α through
Pr(yi ∈ Bs|xi) =
exp(Z s′α)
l∈Bs exp(x
′ il
β/σs)
)σs
t=1 exp(Z ′ tα)
l∈Bt exp(x ′ il
β/σt)
)σs =
exp(Z s′α + σs Wˆs) ∑S t=1 exp(Z
′ tα^ +^ σt^ Wˆt)
where
Wˆs = ln
l∈Bs
exp(x′ il β/σ̂ s)
known as the “inclusive values”. Hence we have another conditional logit model back that is easily estimable. These two-step estimators are not efficient. The variance/covariance matrix is provided in McFadden (1981).
- − 4 − 3 − 2 −