Lecture 3 Discrete Choice Models, Slides of Microeconomics

Truncated variables: We only sample from (observe/use) a subset of the population. The variable is observed only beyond a certain threshold level.

Typology: Slides

2021/2022

Uploaded on 09/07/2022

nabeel_kk
nabeel_kk 🇸🇦

4.6

(65)

1.3K documents

1 / 32

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
RS – Lecture 17
1
Lecture 3
Discrete Choice Models
Limited Dependent Variables
Discrete Dependent Variable Continuous dependent variable
Truncated/
Censored
Regr. Models
Discrete
Choice
Models (DCM) Duration
(Hazard)
Models
Truncated,
Censored
To date we have implicitly assumed that the variable yiis a
continuous random variable.
But, the CLM does not require this assumption! The dependent
variable can have discontinuities. For example, it can be discrete or
follow counts. In these cases, linearity of the conditional expectations
is unusual.
Different types of discontinuities generate different models:
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20

Partial preview of the text

Download Lecture 3 Discrete Choice Models and more Slides Microeconomics in PDF only on Docsity!

1

Lecture 3

Discrete Choice Models

Limited Dependent Variables

Discrete Dependent Variable Continuous dependent variable

Truncated/ Censored Regr. Models

Discrete Choice Models (DCM)

Duration (Hazard) Models

Truncated, Censored

  • To date we have implicitly assumed that the variable yi is a continuous random variable.
  • But, the CLM does not require this assumption! The dependent variable can have discontinuities. For example, it can be discrete or follow counts. In these cases, linearity of the conditional expectations is unusual.
  • Different types of discontinuities generate different models:

From Frances and Paap (2001)

With limited dependent variables, the conditional mean is rarely linear. We need to use adjusted models.

Limited Dependent Variables

Limdep: Discrete Choice Models (DCM)

  • We usually study discrete data that represent a decision, a choice.
  • Sometimes, there is a single choice. Then, the data come in binary form with a ”1” representing a decision to do something and a ”0” being a decision not to do something. => Single Choice (binary choice models): Binary Data Data: yi = 1 (yes/accept) or 0 (no/reject)
  • Examples: Trade a stock or not, do an MBA or not, etc.
  • Or we can have several choices. Then, the data may come as 1, 2, ..., J, where J represents the number of choices. => Multiple Choice (multinomial choice models) Data: yi = 1(opt. 1), 2 (opt. 2), ....., J (opt. J) - Examples: CEO candidates, transportation modes, etc.

Limdep: Truncated/Censored Models

  • Truncated variables: We only sample from (observe/use) a subset of the population. The variable is observed only beyond a certain threshold level (‘truncation point’) - store expenditures, labor force participation, income below poverty line.
  • Censored variables: Values in a certain range are all transformed to/grouped into (or reported as) a single value. - hours worked, exchange rates under Central Bank intervention. Note: Censoring is essentially a defect in the sample data. Presumably, if they were not censored, the data would be a representative sample from the population of interest.

0 = Not Healthy 1 = Healthy

Limdep: Censored Health Satisfaction Data

Limdep: Duration/Hazard Models

  • We model the time between two events. Examples:
    • Time between two trades
    • Time between cash flows withdrawals from a fund
    • Time until a consumer becomes inactive/cancels a subscription
    • Time until a consumer responds to direct mail/ a questionnaire
  • Consumers Maximize Utility.
  • Fundamental Choice Problem: Max U(x 1 ,x 2 ,…) subject to prices and budget constraints
  • A Crucial Result for the Classical Problem:
    • Indirect Utility Function: V = V(p,I)
    • Demand System of Continuous Choices
  • The Integrability Problem: Utility is not revealed by demands

Microeconomics behind Discrete Choice

* ( , I) /

( , I) / I

j j

V p x V

p p

  • The modern literature goes back to the work by Daniel McFadden in the seventies and eighties (McFadden 1973, 1981, 1982, 1984).
  • Usual Notation: n = decision maker i,j = choice options y = decision outcome x = explanatory variables/covariates β = parameters ε = error term I[.] = indicator function: equal to 1 if expression within brackets is true, 0 otherwise. Example: I[y=j|x] = 1 if j was selected (given x) = 0 otherwise

Discrete Choice Models (DCM)

  • Q: Are the characteristics of the consumers relevant?
  • Predicting behavior
    • Individual – for example, will a person buy the add-on insurance?
    • Aggregate – for example, what proportion of the population will buy the add-on insurance?
  • Analyze changes in behavior when attributes change. For example, how will changes in education change the proportion of who buy the insurance?

DCM – What Can we Learn from the Data?

Application: Health Care Usage (Greene)

German Health Care Usage Data, N = 7,293, Varying Numbers of Periods Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. This is a large data set. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). (Downloaded from the JAE Archive) Variables in the file are DOCTOR HOSPITAL = 1(Number of hospital visits > 0) = 1(Number of doctor visits > 0) HSATDOCVIS = health satisfaction, coded 0 (low) - 10 (high)= number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLICADDON = insured in public health insurance = 1; otherwise = 0= insured by add-on insurance = 1; otherswise = 0 HHNINC = household nominal monthly net income in German marks / 10000 (4 observations with income=0 were dropped). HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC AGE = years of schooling= age in years FEMALE = 1 for female headed household, 0 for male EDUC = years of education

Application: Binary Choice Data (Greene)

DCM: Setup – RUM

  1. Random utility maximization (RUM) Assumption: Revealed preference. The decision maker selects the alternative that provides the highest utility. That is, Decision maker n selects choice i if Uni > Unj ∀ j ≠ i

Decomposition of utility: A deterministic (observed), Vnj, and random (unobserved) part, εnj: Unj = Vnj + εnj

  • The deterministic part, Vnj, is a function of some observed variables, xnj (age, income, sex, price, etc.): Vnj = α + β 1 Agen + β 2 Incomenj + β 3 Sexn + β 4 Pricenj
  • The random part, εnj, follows a distribution. For example, a normal.

DCM: Setup – RUM

  1. Random utility maximization (RUM)
  • We think of an individual’s utility as an unobservable variable, with an observable component, V, and an unobservable (tastes?) random component, ε.
  • The deterministic part is usually intrinsic linear in the parameters. Vnj = α + β 1 Agen + β 2 Incomenj + β 3 Sexn + β 4 Pricenj
  • In this formulation, the parameters, β, are the same for all individuals. There is no heterogeneity. This is a useful assumption for estimation. It can be relaxed.
  1. Random utility maximization (continuation) Probability Model: Since both U’s are random, the choice is random. Then, n selects i over j if:

=> Pnj = F(X,β) is a CDF.

  • Vnj - Vnj = h(X, β). h(.) is usually referred as the index function.
  • To evaluate the CDF, F(X,β), f(εn) needs to be specified.

Pni = (^) ∫ I (ε (^) nj − ε ni < VniVnjji ) fn ) d ε n

Prob ( )

Prob ( )

Prob ( )

V V j i

V V j i

P U U j i

nj ni ni nj

ni ni nj nj

ni ni nj

= ε − ε < − ∀ ≠

= + ε > + ε ∀ ≠

= > ∀ ≠

DCM: Setup - RUM

F  h x ( i , β)  = Pr [ yi = 1 ]

DCM: Setup - RUM

h (X, β).

DCM: Setup - RUM

  • Note: Probit? Logit?

A one standard deviation change in the argument of a standard Normal distribution function is usually called a “Probability Unit” or Probit for short. “Probit” graph papers have a normal probability scales on one axis. The Normal qualitative choice model became known as the Probit model. The “it” was transmitted to the Logistic Model (Logit) and the Gompertz Model (Gompit).

DCM: Setup - Distributions

  • Many candidates for CDF –i.e., Pn(x’nβ) = F(Zn),:
    • Normal (Probit Model) = Φ(Zn)
    • Logistic (Logit Model) = 1/[1+exp(-Zn)]
    • Gompertz (Gompit Model) = 1 – exp[-exp(Zn)]
  • Suppose we have binary (0,1) data. Assume β > 0.
  • Probit Model: Prob(yn=1) approaches 1 very rapidly as X and therefore Z increase. It approaches 0 very rapidly as X and Z decrease
  • Logit Model: It approaches the limits 0 and 1 more slowly than does the Probit.
  • Gompit Model: Its distribution is strongly negatively skewed, approaching 0 very slowly for small values of Z, and 1 even more rapidly than the Probit for large values of Z.

DCM: Setup - Distributions

  • Comparisons: Probit vs Logit

Note: Not all the parameters may be identified.

  • Suppose we are interested in whether an agent chooses to visit a doctor or not –i.e., (0,1) data. If Uvisit > 0, an agent visits a doctor. Uvisit>0 ⇔ α + β 1 Age + β 2 Income + β 3 Sex + ε > 0 => ε > -(α + β 1 Age+ β 2 Income+ β 3 Sex) Let Y = 1 if Uvisit > 0 Var[ε] = σ^2
  • Now, divide everything by σ. Uvisit>0 ⇔ ε/σ> -[α/σ + (β 1 /σ) Age+ (β 2 /σ)Income+ (β 3 /σ) Sex] > 0 or w > -[α’ + β 1 ’Age + β’Income + β’Sex] > 0

DCM: Setup - Normalization

Example (from Train (2002)): Suppose there are two types of individuals, a and b, equally represented in the population, with V a = β ′ xa V b = β ′ xb then

[ ]

a^ Pr^ i^1 a a

P y x F β x

= ′ [^ ]

b^ Pr^ i^1 b b

P y x F β x

=^ ′

but

P = 12 ( Pa + Pb ) ≠ P x ( ) = F [ β ′ x ]

DCM: Setup – Aggregation

F ( ⋅)

V

P a V a

P b

V V b

P

P V ( )

DCM: Setup – Aggregation

In general, will tend to (underestimate) overestimate

when probabilities are (high) low

P V ( (^) ) P

Graph: Average probability (2.1) vs. Probability of the average (2.2)

DCM: Setup - Aggregation

  1. Identification problems a. Only differences in utility matter Choice probabilities do not change when a constant is added to each alternative’s utility Implication: Some parameters cannot be identified/estimated. Alternative-specific constants; coefficients of variables that change over decision makers but not over alternatives. b. Overall scale of utility is irrelevant Choice probabilities do not change when the utility of all alternatives are multiplied by the same factor Implication: Coefficients of different models (data sets) are not directly comparable. Normalization of parameters or variance of error terms is used to deal with identification issues.

DCM: Setup - Identification

  • Example: Logit Model Suppose we have binary (0,1) data. The logit model follows from: Pn[yn=1|x] = exp(x’nβ)/[1+exp(x’nβ) = F(x’nβ) Pn[yn=0|x] = 1/[1+exp(x’nβ)] = 1 - F(x’nβ)
  • Likelihood function L(β) = Πn (1- P [yn=1|x,β]) P[yn=1|x,β]
  • Log likelihood Log L(β) = Σy=0 log[1- F(x’nβ)] + Σy=1 log[F(x’nβ)]
  • Numerical optimization to get β.
  • The usual problems with numerical optimization apply. The computation of the Hessian, H, may cause problems.
  • ML estimators are consistency, asymptotic normal and efficient.

DCM: ML Estimation

38

  • How can we estimate the covariance matrix, Σβl? Using the usual conditions, we can use the information matrix:
  • The NR and BHHH are asymptotically equivalent but in small samples they often provide different covariance estimates for the same model

2 1 β l^1 2 1 β l β β (^) l T 1 β l i^ i (^1) β β (^) l

I n g e n e r a l: E L I β β β

N e w t o n - R a p h s o n : L β β

B H H H : L^ L i β^ β

− −

= − = (^) =

Σ = ^ − ∂  =  (^) ∂ ∂ ′  

Σ = ^ − ∂   (^) ∂ ∂ ′  

Σ = ^ ∑ ∂^ ∂   (^) ∂ ∂ ′

DCM: ML Estimation – Covariance Matrix

DCM: ML Estimation

  • Numerical optimization - Steps: (1) Start by specifying the likelihood for one observation: Fn(X,β) (2) Get the joint likelihood function: L(β) = Πn Fn(X,β) (3) It is easier to work with the log likelihood function: Log L(β) = Σ (^) n ln(Fn(X,β)) (4) Maximize Log L(β) with respect to β
  • Set the score equal to 0 ⇒ no closed-form solution
  • Numerical optimization, as usual: (i) Starting values β 0 (ii) Determine new value βt+1 = βt + update, such that Log L(βt+1) > Log L(βt). Say, N-R’s updating step: βt+1 = βt - λt H-1^ ∇f (βt) (iii) Repeat step (ii) until convergence.

DCM: Bayesian Estimation

  • The Bayesian estimator will be the mean of the posterior density:
  • f(β,γ) is the prior density for the model parameters
  • f( y|X,β,γ) is the likelihood.
  • As usual we need to specify the prior and the likelihood:
    • The priors are usually non-informative (flat), say f(β,γ) α 1.
  • The likelihood depends on the model in mind. For a Probit Model, we will use a normal distribution. If we have binary data, then,

f( y|X,β,γ) = Πn (1-Φ[yn|x,β,γ]) Φ[yn|x,β,γ]

∫ γ γ γ

= γ γ γ

γ = γ γ f f d d

f f f

f f f y|X, β β β

y|X, β β y|X β

β |yX y|X, β^ β ( , ) ( , )

( , ) ( , ) ( , , )

( , , ) ( , ) (, )