


































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The logit model, its derivation from luce's assumptions of independence from irrelevant alternatives (iia), and its applications in binary choice situations. The sigmoid shape of logit probabilities, the implications for policy makers, and the challenges when tastes vary systematically. It also discusses the use of logit models with panel data and dynamic aspects of behavior.
Typology: Essays (university)
1 / 42
This page cannot be seen from the preview
Don't miss anything!



































By far the easiest and most widely used discrete choice model is logit. Its popularity is due to the fact that the formula for the choice proba- bilities takes a closed form and is readily interpretable. Originally, the logit formula was derived by Luce (1959) from assumptions about the characteristics of choice probabilities, namely the independence from ir- relevant alternatives (IIA) property discussed in Section 3.3.2. Marschak (1960) showed that these axioms implied that the model is consistent with utility maximization. The relation of the logit formula to the distri- bution of unobserved utility (as opposed to the characteristics of choice probabilities) was developed by Marley, as cited by Luce and Suppes (1965), who showed that the extreme value distribution leads to the logit formula. McFadden (1974) completed the analysis by showing the converse: that the logit formula for the choice probabilities necessarily implies that unobserved utility is distributed extreme value. In his Nobel lecture, McFadden (2001) provides a fascinating history of the develop- ment of this path-breaking model. To derive the logit model, we use the general notation from Chapter 2 and add a specific distribution for unobserved utility. A decision maker, labeled n , faces J alternatives. The utility that the decision maker obtains from alternative j is decomposed into (1) a part labeled Vn j that is known by the researcher up to some parameters, and (2) an unknown part ε n j that is treated by the researcher as random: U (^) n j = Vn j + ε n j ∀ j. The logit model is obtained by assuming that each ε n j is independently, identically distributed extreme value. The distribution is also called Gumbel and type I extreme value (and sometimes, mistakenly, Weibull). The density for each unobserved component of utility is
(3.1) f (ε n j ) = e −ε n j^ e − e
−ε n j ,
and the cumulative distribution is
(3.2) F (ε n j ) = e − e
−ε n j .
34
Logit 35
The variance of this distribution is π^2 /6. By assuming the variance is π^2 /6, we are implicitly normalizing the scale of utility, as discussed in Section 2.5. We return to this issue, and its relevance to interpretation, in the next section. The mean of the extreme value distribution is not zero; however, the mean is immaterial, since only differences in utility matter (see Chapter 2), and the difference between two random terms that have the same mean has itself a mean of zero. The difference between two extreme value variables is distributed logistic. That is, if ε n j and ε ni are iid extreme value, then ε∗ n ji = ε n j − ε ni follows the logistic distribution
ε∗ n ji
e ε
∗ n ji
1 + e ε
∗ n ji^.
This formula is sometimes used in describing binary logit models, that is, models with two alternatives. Using the extreme value distribution for the errors (and hence the logistic distribution for the error differences) is nearly the same as assuming that the errors are independently normal. The extreme value distribution gives slightly fatter tails than a normal, which means that it allows for slightly more aberrant behavior than the normal. Usually, however, the difference between extreme value and independent normal errors is indistinguishable empirically. The key assumption is not so much the shape of the distribution as that the errors are independent of each other. This independence means that the unobserved portion of utility for one alternative is unrelated to the unobserved portion of utility for another alternative. It is a fairly restrictive assumption, and the development of other models such as those described in Chapters 4–6 has arisen largely for the purpose of avoiding this assumption and allowing for correlated errors. It is important to realize that the independence assumption is not as restrictive as it might at first seem, and in fact can be interpreted as a natural outcome of a well-specified model. Recall from Chapter 2 that ε n j is defined as the difference between the utility that the decision maker actually obtains, U (^) n j , and the representation of utility that the researcher has developed using observed variables, Vn j. As such, ε n j and its distri- bution depend on the researcher’s specification of representative utility; it is not defined by the choice situation per se. In this light, the assump- tion of independence attains a different stature. Under independence, the error for one alternative provides no information to the researcher about the error for another alternative. Stated equivalently, the researcher has specified Vn j sufficiently that the remaining, unobserved portion of utility is essentially “white noise.” In a deep sense, the ultimate goal of the
Logit 37
Representative utility is usually specified to be linear in parameters: Vn j = β′^ x (^) n j , where x (^) n j is a vector of observed variables relating to al- ternative j. With this specification, the logit probabilities become
Pni =
e β
′ (^) x (^) ni ∑ j e
β′^ x (^) n j.
Under fairly general conditions, any function can be approximated ar- bitrarily closely by one that is linear in parameters. The assumption is therefore fairly benign. Importantly, McFadden (1974) demonstrated that the log-likelihood function with these choice probabilities is glob- ally concave in parameters β, which helps in the numerical maximization procedures (as discussed in Chapter 8). Numerous computer packages contain routines for estimation of logit models with linear-in-parameters representative utility. The logit probabilities exhibit several desirable properties. First, Pni is necessarily between zero and one, as required for a probability. When Vni rises, reflecting an improvement in the observed attributes of the alter- native, with Vn j ∀ j = i held constant, Pni approaches one. And Pni ap- proaches zero when Vni decreases, since the exponential in the numerator of (3.6) approaches zero as Vni approaches −∞. The logit probability for an alternative is never exactly zero. If the researcher believes that an alter- native has actually no chance of being chosen by a decision maker, the re- searcher can exclude that alternative from the choice set. A probability of exactly 1 is obtained only if the choice set consists of a single alternative.
∑Second, the choice probabilities for all alternatives sum to one: J i = 1 Pni^ =^
i exp( Vni^ )/^
j exp( Vn j^ )^ =^ 1. The decision maker neces- sarily chooses one of the alternatives. The denominator in (3.6) is simply the sum of the numerator over all alternatives, which gives this summing- up property automatically. With logit, as well as with some more complex models such as the nested logit models of Chapter 4, interpretation of the choice probabilities is facilitated by recognition that the denominator serves to assure that the probabilities sum to one. In other models, such as mixed logit and probit, there is no denominator per se to interpret in this way. The relation of the logit probability to representative utility is sigmoid, or S-shaped, as shown in Figure 3.1. This shape has implications for the impact of changes in explanatory variables. If the representative utility of an alternative is very low compared with other alternatives, a small in- crease in the utility of the alternative has little effect on the probability of its being chosen: the other alternatives are still sufficiently better such that this small improvement doesn’t help much. Similarly, if one alternative
38 Behavioral Models
Pni
V (^) ni
1
0
Figure 3.1. Graph of logit curve.
is far superior to the others in observed attributes, a further increase in its representative utility has little effect on the choice probability. The point at which the increase in representative utility has the greatest effect on the probability of its being chosen is when the probability is close to 0.5, meaning a 50–50 chance of the alternative being chosen. In this case, a small improvement tips the balance in people’s choices, inducing a large change in probability. The sigmoid shape of logit probabilities is shared by most discrete choice models and has important implications for policy makers. For example, improving bus service in areas where the service is so poor that few travelers take the bus would be less effective, in terms of transit ridership, than making the same improvement in areas where bus service is already sufficiently good to induce a moderate share of travelers to choose it (but not so good that nearly everyone does). The logit probability formula is easily interpretable in the context of an example. Consider a binary choice situation first: a household’s choice between a gas and an electric heating system. Suppose that the utility the household obtains from each type of system depends only on the purchase price, the annual operating cost, and the household’s view of the convenience and quality of heating with each type of system and the relative aesthetics of the systems within the house. The first two of these factors can be observed by the researcher, but the researcher cannot observe the others. If the researcher considers the observed part of utility to be a linear function of the observed factors, then the utility of each heating system can be written as: U (^) g = β 1 PP g + β 2 OC g + ε g and U (^) e = β 1 PP e + β 2 OC e + ε e , where the subscripts g and e denote gas and electric, PP and OC are the purchase price and operating cost, β 1 and β 2 are scalar parameters, and the subscript n for the household is suppressed. Since higher costs mean less money to spend on other goods, we expect utility to drop as purchase price or operating cost rises (with all else held constant): β 1 < 0 and β 2 < 0.
40 Behavioral Models
form that is used in most textbooks and computer manuals for binary logit. Multinomial choice is a simple extension. Suppose there is a third type of heating system, namely oil-fueled. The utility of the oil system is specified as the same form as for the electric and gas systems: U (^) o = β 1 PP o + β 2 OC o + ε o. With this extra option available, the probability that the household chooses a gas system is
Pg =
e β^1 PP g^ +β^2 OC g e β^1 PP g^ +β^2 OC g^ + e β^1 PP e^ +β^2 OC e^ + e β^1 PP o +β^2 OC o
which is the same as (3.7) except that an extra term is included in the denominator to represent the oil heater. Since the denominator is larger while the numerator is the same, the probability of choosing a gas system is smaller when an oil system is an option than when not, as one would expect in the real world.
In the previous section we derived the logit formula under the assumption that the unobserved factors are distributed extreme value with variance π^2 /6. Setting the variance to π^2 /6 is equivalent to normalizing the model for the scale of utility, as discussed in Section 2.5. It is useful to make these concepts more explicit, to show the role that the variance of the unobserved factors plays in logit models. In general, utility can be expressed as U ∗ n j = Vn j + ε∗ n j , where the un- observed portion has variance σ 2 × (π^2 /6). That is, the variance is any number, re-expressed as a multiple of π^2 /6. Since the scale of utility is irrelevant to behavior, utility can be divided by σ without changing be- havior. Utility becomes U (^) n j = Vn j /σ + ε n j where ε n j = ε∗ n j /σ. Now the
unobserved portion has variance π^2 /6: Var(ε n j ) = Var(ε∗ n j /σ ) = (1/σ 2 )
Var(ε∗ n j ) = (1/σ 2 ) × σ 2 × (π^2 /6) = π^2 / 6. The choice probability is
Pni =
e Vni^ /σ ∑ j e^ Vn j^ /σ^
which is the same formula as in equation (3.6) but with the representative utility divided by σ. If Vn j is linear in parameters with coefficient β∗, the choice probabilities become
Pni =
e (β
∗/σ )′ (^) x (^) ni ∑ j e
(β∗/σ )′^ x (^) n j.
Each of the coefficients is scaled by 1/σ. The parameter σ is called the
Logit 41
scale parameter , because it scales the coefficients to reflect the variance of the unobserved portion of utility. Only the ratio β∗/σ can be estimated; β∗^ and σ are not separately identified. Usually, the model is expressed in its scaled form, with β = β∗/σ , which gives the standard logit expression
Pni =
e β
′ (^) x (^) ni ∑ j e β
′ (^) x (^) n j.
The parameters β are estimated, but for interpretation it is useful to recognize that these estimated parameters are actually estimates of the “original” coefficients β∗^ divided by the scale parameter σ. The coef- ficients that are estimated indicate the effect of each observed variable relative to the variance of the unobserved factors. A larger variance in unobserved factors leads to smaller coefficients, even if the observed factors have the same effect on utility (i.e., higher σ means lower β even if β∗^ is the same). The scale parameter does not affect the ratio of any two coefficients, since it drops out of the ratio; for example, β 1 /β 2 = (β∗ 1 /σ )/(β 2 ∗ /σ ) = β 1 ∗ /β∗ 2 , where the subscripts refer to the first and second coefficients. Willingness to pay, values of time, and other measures of marginal rates of substitution are not affected by the scale parameter. Only the inter- pretation of the magnitudes of all coefficients is affected. So far we have assumed that the variance of the unobserved factors is the same for all decision makers, since the same σ is used for all n. Suppose instead that the unobserved factors have greater variance for some decision makers than others. In Section 2.5, we discuss a situation where the variance of unobserved factors is different in Boston than in Chicago. Denote the variance for all decision makers in Boston as (σ B^ )^2 (π^2 /6) and that for decision makers in Chicago as (σ C^ )^2 (π^2 /6). The ratio of variance in Chicago to that in Boston is k = (σ C^ /σ B^ )^2. The choice probabilities for people in Boston become
Pni =
e β
′ (^) x (^) ni ∑ j e β
′ (^) x (^) n j ,
and for people in Chicago
Pni =
e (β/
√ k )′^ x (^) ni ∑ j e (β/
√ k )′^ x (^) n j
where β = β∗/σ B^. The ratio of variances k is estimated along with the coefficients β. The estimated β’s are interpreted as being relative to the
Logit 43
linked to observed demographic characteristics, just because different people are different. Two people who have the same income, education, etc., will make different choices, reflecting their individual preferences and concerns. Logit models can capture taste variations, but only within limits. In particular, tastes that vary systematically with respect to observed vari- ables can be incorporated in logit models, while tastes that vary with unobserved variables or purely randomly cannot be handled. The fol- lowing example illustrates the distinction. Consider households’ choice among makes and models of cars to buy. Suppose for simplicity that the only two attributes of cars that the researcher observes are the purchase price, PP (^) j for make/model j , and inches of shoulder room, SR (^) j , which is a measure of the interior size of a car. The value that households place on these two attributes varies over households, and so utility is written as
(3.8) U (^) n j = α n SR (^) j + β n PP (^) j + ε n j ,
where α n and β n are parameters specific to household n. The parameters vary over households reflecting differences in taste. Suppose for example that the value of shoulder room varies with the number of members in the households, M (^) n , but nothing else:
α n = ρ M (^) n ,
so that as M (^) n increases, the value of shoulder room, α n , also increases. Similarly, suppose the importance of purchase price is inversely related to income, I (^) n , so that low-income households place more importance on purchase price:
β n = θ/ I (^) n.
Substituting these relations into (3.8) produces
U (^) n j = ρ( M (^) n SR (^) j ) + θ(PP (^) j / I (^) n ) + ε n j.
Under the assumption that each ε n j is iid extreme value, a standard logit model obtains with two variables entering representative utility, both of which are an interaction of a vehicle attribute with a household characteristic. Other specifications for the variation in tastes can be substituted. For example, the value of shoulder room might be assumed to increase with household size, but at a decreasing rate, so that α n = ρ M (^) n + φ M n^2 where ρ is expected to be positive and φ negative. Then U (^) n j = ρ( M (^) n SR (^) j ) + φ( M n^2 SR (^) j ) + θ(PP (^) j / I (^) n ) + ε n j , which results in a logit model with three variables entering the representative utility.
44 Behavioral Models
The limitation of the logit model arises when we attempt to allow tastes to vary with respect to unobserved variables or purely randomly. Suppose for example that the value of shoulder room varied with household size plus some other factors (e.g., size of the people themselves, or frequency with which the household travels together) that are unobserved by the researcher and hence considered random:
α n = ρ M (^) n + μ n ,
where μ n is a random variable. Similarly, the importance of purchase price consists of its observed and unobserved components:
β n = θ/ I (^) n + η n.
Substituting into (3.8) produces
U (^) n j = ρ( M (^) n SR (^) j ) + μ n SR (^) j + θ(PP (^) j / I (^) n ) + η n PP (^) j + ε n j.
Since μ n and η n are not observed, the terms μ n SR (^) j and η n PP (^) j become part of the unobserved component of utility,
U (^) n j = ρ( M (^) n SR (^) j ) + θ(PP (^) j / I (^) n ) + ε˜ n j ,
where ˜ε n j = μ n SR (^) j + η n PP (^) j + ε n j. The new error terms ˜ε n j cannot pos- sibly be distributed independently and identically as required for the logit formulation. Since μ n and η n enter each alternative, ˜ε n j is neces- sarily correlated over alternatives: Cov( ˜ε n j , ε˜ nk ) = Var(μ n )SR (^) j SR k + Var(η n )PP (^) j PP k = 0 for any two cars j and k. Furthermore, since SR (^) j and PP (^) j vary over alternatives, the variance of ˜ε n j varies over al- ternatives, violating the assumption of identically distributed errors: Var( ˜ε n j ) = Var(μ n )SR (^2) j + Var(η n )PP^2 j + Var(ε n j ), which is different for different j. This example illustrates the general point that when tastes vary sys- tematically in the population in relation to observed variables, the varia- tion can be incorporated into logit models. However, if taste variation is at least partly random, logit is a misspecification. As an approximation, logit might be able to capture the average tastes fairly well even when tastes are random, since the logit formula seems to be fairly robust to misspecifications. The researcher might therefore choose to use logit even when she knows that tastes have a random component, for the sake of simplicity. However, there is no guarantee that a logit model will approximate the average tastes. And even if it does, logit does not pro- vide information on the distribution of tastes around the average. This distribution can be important in many situations, such as forecasting the penetration of a new product that appeals to a minority of people rather
46 Behavioral Models
alternatives are available or what the attributes of the other alternatives are. Since the ratio is independent from alternatives other than i and k , it is said to be independent from irrelevant alternatives. The logit model exhibits this independence from irrelevant alternatives , or IIA. In many settings, choice probabilities that exhibit IIA provide an ac- curate representation of reality. In fact, Luce (1959) considered IIA to be a property of appropriately specified choice probabilities. He derived the logit model directly from an assumption that choice probabilities ex- hibit IIA, rather than (as we have done) derive the logit formula from an assumption about the distribution of unobserved utility and then observe that IIA is a resulting property. While the IIA property is realistic in some choice situations, it is clearly inappropriate in others, as first pointed out by Chipman (1960) and Debreu (1960). Consider the famous red-bus–blue-bus problem. A traveler has a choice of going to work by car or taking a blue bus. For simplicity assume that the representative utility of the two modes are the same, such that the choice probabilities are equal: Pc = Pbb = 12 , where c is car and bb is blue bus. In this case, the ratio of probabilities is one: Pc / Pbb = 1. Now suppose that a red bus is introduced and that the traveler considers the red bus to be exactly like the blue bus. The probability that the traveler will take the red bus is therefore the same as for the blue bus, so that the ratio of their probabilities is one: Prb / Pbb = 1. However, in the logit model the ratio Pc / Pbb is the same whether or not another alternative, in this case the red bus, exists. This ratio therefore remains at one. The only probabilities for which Pc / Pbb = 1 and Prb / Pbb = 1 are Pc = Pbb = Prb = 13 , which are the probabilities that the logit model predicts. In real life, however, we would expect the probability of taking a car to remain the same when a new bus is introduced that is exactly the same as the old bus. We would also expect the original probability of taking bus to be split between the two buses after the second one is introduced. That is, we would expect Pc = 12 and Pbb = Prb = 14. In this case, the logit model, because of its IIA property, overestimates the probability of tak- ing either of the buses and underestimates the probability of taking a car. The ratio of probabilities of car and blue bus, Pc / Pbb , actually changes with the introduction of the red bus, rather than remaining constant as required by the logit model. This example is rather stark and unlikely to be encountered in the real world. However, the same kind of misprediction arises with logit models whenever the ratio of probabilities for two alternatives changes with the introduction or change of another alternative. For example, suppose a new transit mode is added that is similar to, but not exactly like, the existing modes, such as an express bus along a line that already has
Logit 47
standard bus service. This new mode might be expected to reduce the probability of regular bus by a greater proportion than it reduces the probability of car, so that ratio of probabilities for car and regular bus does not remain constant. The logit model would overpredict demand for the two bus modes in this situation. Other examples are given by, for example, Ortuzar (1983) and Brownstone and Train (1999).
Proportional Substitution The same issue can be expressed in terms of the cross-elasticities of logit probabilities. Let us consider changing an attribute of alternative j. We want to know the effect of this change on the probabilities for all the other alternatives. Section 3.6 derives the formula for the elasticity of Pni with respect to a variable that enters the representative utility of alternative j :
E (^) i z (^) n j = −β z z (^) n j Pn j ,
where z (^) n j is the attribute of alternative j as faced by person n and β z is its coefficient (or, if the variable enters representative utility nonlinearly, then β z is the derivative of Vn j with respect to z (^) n j ). This cross-elasticity is the same for all i : i does not enter the formula. An improvement in the attributes of an alternative reduces the probabil- ities for all the other alternatives by the same percentage. If one alter- native’s probability drops by ten percent, then all the other alternatives’ probabilities also drop by ten percent (except of course the alternative whose attribute changed; its probability rises due to the improvement). A way of stating this phenomenon succinctly is that an improvement in one alternative draws proportionately from the other alternatives. Simi- larly, for a decrease in the representative utility of an alternative, the probabilities for all other alternatives rise by the same percentage. This pattern of substitution, which can be called proportionate shift- ing , is a manifestation of the IIA property. The ratio of probabilities for alternatives i and k stays constant when an attribute of alternative j changes only if the two probabilities change by the same proportion. With superscript 0 denoting probabilities before the change and 1 after, the IIA property requires that
P^1 ni P nk^1
P ni^0 P^0 nk
when an attribute of alternative j changes. This equality can only be maintained if each probability changes by the same proportion: P^1 ni = λ P^0 ni and P^1 nk = λ P nk^0 , where both λ’s are the same.
Logit 49
parameters consistently on a subset of alternatives for each sampled decision maker. For example, in a situation with 100 alternatives, the researcher might, so as to reduce computer time, estimate on a subset of 10 alternatives for each sampled person, with the person’s chosen alternative included as well as 9 alternatives randomly selected from the remaining 99. Since relative probabilities within a subset of alternatives are unaffected by the attributes or existence of alternatives not in the subset, exclusion of alternatives in estimation does not affect the con- sistency of the estimator. Details of this type of estimation are given in Section 3.7.1. This fact has considerable practical importance. In ana- lyzing choice situations for which the number of alternatives is large, estimation on a subset of alternatives can save substantial amounts of computer time. At an extreme, the number of alternatives might be so large as to preclude estimation altogether if it were not possible to utilize a subset of alternatives. Another practical use of the IIA property arises when the researcher is only interested in examining choices among a subset of alternatives and not among all alternatives. For example, consider a researcher who is interested in understanding the factors that affect workers’ choice between car and bus modes for travel to work. The full set of alternative modes includes walking, bicycling, motorbiking, skateboarding, and so on. If the researcher believed that the IIA property holds adequately well in this case, she could estimate a model with only car and bus as the alternatives and exclude from the analysis sampled workers who used other modes. This strategy would save the researcher considerable time and expense developing data on the other modes, without hampering her ability to examine the factors related to car and bus.
Tests of IIA Whether IIA holds in a particular setting is an empirical ques- tion, amenable to statistical investigation. Tests of IIA were first devel- oped by McFadden et al. (1978). Two types of tests are suggested. First, the model can be reestimated on a subset of the alternatives. Under IIA, the ratio of probabilities for any two alternatives is the same whether or not other alternatives are available. As a result, if IIA holds in reality, then the parameter estimates obtained on the subset of alternatives will not be significantly different from those obtained on the full set of alter- natives. A test of the hypothesis that the parameters on the subset are the same as the parameters on the full set constitutes a test of IIA. Hausman and McFadden (1984) provide an appropriate statistic for this type of test. Second, the model can be reestimated with new, cross-alternative
50 Behavioral Models
variables, that is, with variables from one alternative entering the utility of another alternative. If the ratio of probabilities for alternatives i and k actually depends on the attributes and existence of a third alternative j (in violation of IIA), then the attributes of alternative j will enter sig- nificantly the utility of alternatives i or k within a logit specification. A test of whether cross-alternative variables enter the model therefore constitutes a test of IIA. McFadden (1987) developed a procedure for performing this kind of test with regressions: with the dependent vari- able being the residuals of the original logit model and the explanatory variables being appropriately specified cross-alternative variables. Train et al. (1989) show how this procedure can be performed conveniently within the logit model itself. The advent of models that do not exhibit IIA, and especially the de- velopment of software for estimating these models, makes testing IIA easier than before. For more flexible specifications, such as GEV and mixed logit, the simple logit model with IIA is a special case that arises under certain constraints on the parameters of the more flexible model. In these cases, IIA can be tested by testing these constraints. For example, a mixed logit model becomes a simple logit if the mixing distribution has zero variance. IIA can be tested by estimating a mixed logit and testing whether the variance of the mixing distribution is in fact zero. A test of IIA as a constraint on a more general model necessarily operates under the maintained assumption that the more general model is itself an appropriate specification. The tests on subsets of alterna- tives (Hausman and McFadden, 1984) and cross-alternative variables (McFadden, 1987; Train et al. , 1989), while more difficult to perform, operate under less restrictive maintained hypotheses. The counterpoint to this advantage, of course, is that, when IIA fails, these tests do not provide as much guidance on the correct specification to use instead of logit.
3.3.3. Panel Data
In many settings, the researcher can observe numerous choices made by each decision maker. For example, in labor studies, sampled people are observed to work or not work in each month over several years. Data on the current and past vehicle purchases of sampled households might be obtained by a researcher who is interested in the dynamics of car choice. In market research surveys, respondents are often asked a series of hypothetical choice questions, called “stated preference” experiments. For each experiment, a set of alternative products with different attributes
52 Behavioral Models
unless another alternative provides sufficiently higher utility to warrant a switch. This behavior is captured as Vn jt = α yn j ( t −1) + β x (^) n jt , where yn jt = 1 if n chose j in period t and 0 otherwise. With α > 0, the utility of alternative j in the current period is higher if alternative j was consumed in the previous period. The same specification can also capture a type of variety seeking. If α is negative, the consumer obtains higher utility from not choosing the same alternative that he chose in the last period. Numerous variations on these concepts are possible. Adamowicz (1994) enters the number of times the alternative has been chosen previously, rather than simply a dummy for the immediately previous choice. Erdem (1996) enters the attributes of previously chosen alternatives, with the utility of each alternative in the current period depending on the similarity of its attributes to the previously experienced attributes. The inclusion of the lagged dependent variable does not induce in- consistency in estimation, since for a logit model the errors are assumed to be independent over time. The lagged dependent variable yn j ( t −1) is uncorrelated with the current error ε n jt due to this independence. The situation is analogous to linear regression models, where a lagged de- pendent variable can be added without inducing bias as long as the errors are independent over time. Of course, the assumption of independent errors over time is severe. Usually, one would expect there to be some factors that are not observed by the researcher that affect each of the decision makers’ choices. In par- ticular, if there are dynamics in the observed factors, then the researcher might expect there to be dynamics in the unobserved factors as well. In these situations, the researcher can either use a model such as probit or mixed logit that allows unobserved factors to be correlated over time, or respecify representative utility to bring the sources of the unobserved dynamics into the model explicitly such that the remaining errors are independent over time.
In some contexts, the researcher will find it useful to allow parameters to enter representative utility nonlinearly. Estimation is then more difficult, since the log-likelihood function may not be globally concave and computer routines are not as widely available as for logit models with linear-in-parameters utility. However, the aspects of behavior that the researcher is investigating may include parameters that are interpretable only when they enter utility nonlinearly. In these cases, the effort of writing one’s own code can be warranted. Two examples illustrate this point.
Logit 53
Example 1: The Goods–Leisure Tradeoff
Consider a workers’ choice of mode (car or bus) for trips to work. Suppose that workers also choose the number of hours to work based on the standard trade-off between goods and leisure. Train and McFadden (1978) developed a procedure for examining these interrelated choices. As we see in the following, the parameters of the workers’ utility function over goods and leisure enter nonlinearly in the utility for modes of travel. Assume that workers’ preferences regarding goods G and leisure L are represented by a Cobb–Douglas utility function of the form
U = (1 − β) ln G + β ln L.
The parameter β reflects the worker’s relative preference for goods and leisure, with higher β implying greater preference for leisure relative to goods. Each worker has a fixed amount of time (24 hours a day) and faces a fixed wage rate, w. In the standard goods–leisure model, the worker chooses the number of hours to work that maximizes U subject to the constraints that (1) the number of hours worked plus the number of leisure hours equals the number of hours available, and (2) the value of goods consumed equals the wage rate times the number of hours worked. When mode choice is added to the model, the constraints on time and money change. Each mode takes a certain amount of time and costs a certain amount of money. Conditional on choosing car, the worker maximizes U subject to the constraint that (1) the number of hours worked plus the number of leisure hours equals the number of hours available after the time spent driving to work in the car is subtracted and (2) the value of goods consumed equals the wage rate times the number of hours worked minus the cost of driving to work. The utility associated with choosing to travel by car is the highest value of U that can be attained under these constraints. Similarly, the utility of taking the bus to work is the maximum value of U that can be obtained given the time and money that are left after the bus time and cost are subtracted. Train and McFadden derived the maximizing values of U conditional on each mode. For the U given above, these values are
U (^) j = −α
c (^) j /w β^ + w^1 −β^ t (^) j
for j = car and bus.
The cost of travel is divided by w β^ , and the travel time is multiplied by w^1 −β^. The parameter β, which denotes workers’ relative prefer- ence for goods and leisure, enters the mode choice utility nonlinearly. Since this parameter has meaning, the researcher might want to estimate it within this nonlinear utility rather than use a linear-in-parameters approximation.