Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Lecture 3 Discrete Choice Models, Slides of Microeconomics

College of Technology at Riyadh (CTR)Microeconomics

Truncated variables: We only sample from (observe/use) a subset of the population. The variable is observed only beyond a certain threshold level.

Typology: Slides

2021/2022

Uploaded on 09/07/2022

nabeel_kk 🇸🇦

4.6

(65)

1.3K documents

1 / 32

This page cannot be seen from the preview

Don't miss anything!

RS – Lecture 17

Lecture 3

Discrete Choice Models

Limited Dependent Variables

Discrete Dependent Variable Continuous dependent variable

Truncated/

Censored

Regr. Models

Discrete

Choice

Models (DCM) Duration

(Hazard)

Models

Truncated,

Censored

• To date we have implicitly assumed that the variable yiis a

continuous random variable.

• But, the CLM does not require this assumption! The dependent

variable can have discontinuities. For example, it can be discrete or

follow counts. In these cases, linearity of the conditional expectations

is unusual.

• Different types of discontinuities generate different models:

Discover Slides of Microeconomics College of Technology at Riyadh (CTR)

Partial preview of the text

Download Lecture 3 Discrete Choice Models and more Slides Microeconomics in PDF only on Docsity!

Lecture 3

Discrete Choice Models

Limited Dependent Variables

Discrete Dependent Variable Continuous dependent variable

Truncated/ Censored Regr. Models

Discrete Choice Models (DCM)

Duration (Hazard) Models

Truncated, Censored

To date we have implicitly assumed that the variable yi is a continuous random variable.
But, the CLM does not require this assumption! The dependent variable can have discontinuities. For example, it can be discrete or follow counts. In these cases, linearity of the conditional expectations is unusual.
Different types of discontinuities generate different models:

From Frances and Paap (2001)

With limited dependent variables, the conditional mean is rarely linear. We need to use adjusted models.

Limited Dependent Variables

Limdep: Discrete Choice Models (DCM)

We usually study discrete data that represent a decision, a choice.
Sometimes, there is a single choice. Then, the data come in binary form with a ”1” representing a decision to do something and a ”0” being a decision not to do something. => Single Choice (binary choice models): Binary Data Data: yi = 1 (yes/accept) or 0 (no/reject)
Examples: Trade a stock or not, do an MBA or not, etc.
Or we can have several choices. Then, the data may come as 1, 2, ..., J, where J represents the number of choices. => Multiple Choice (multinomial choice models) Data: yi = 1(opt. 1), 2 (opt. 2), ....., J (opt. J) - Examples: CEO candidates, transportation modes, etc.

Limdep: Truncated/Censored Models

Truncated variables: We only sample from (observe/use) a subset of the population. The variable is observed only beyond a certain threshold level (‘truncation point’) - store expenditures, labor force participation, income below poverty line.
Censored variables: Values in a certain range are all transformed to/grouped into (or reported as) a single value. - hours worked, exchange rates under Central Bank intervention. Note: Censoring is essentially a defect in the sample data. Presumably, if they were not censored, the data would be a representative sample from the population of interest.

0 = Not Healthy 1 = Healthy

Limdep: Censored Health Satisfaction Data

Limdep: Duration/Hazard Models

We model the time between two events. Examples:
- Time between two trades
- Time between cash flows withdrawals from a fund
- Time until a consumer becomes inactive/cancels a subscription
- Time until a consumer responds to direct mail/ a questionnaire
Consumers Maximize Utility.
Fundamental Choice Problem: Max U(x 1 ,x 2 ,…) subject to prices and budget constraints
A Crucial Result for the Classical Problem:
- Indirect Utility Function: V = V(p,I)
- Demand System of Continuous Choices
The Integrability Problem: Utility is not revealed by demands

Microeconomics behind Discrete Choice

* ( , I) /

( , I) / I

j j

V p x V

p p

The modern literature goes back to the work by Daniel McFadden in the seventies and eighties (McFadden 1973, 1981, 1982, 1984).
Usual Notation: n = decision maker i,j = choice options y = decision outcome x = explanatory variables/covariates β = parameters ε = error term I[.] = indicator function: equal to 1 if expression within brackets is true, 0 otherwise. Example: I[y=j|x] = 1 if j was selected (given x) = 0 otherwise

Discrete Choice Models (DCM)

Q: Are the characteristics of the consumers relevant?
Predicting behavior
- Individual – for example, will a person buy the add-on insurance?
- Aggregate – for example, what proportion of the population will buy the add-on insurance?
Analyze changes in behavior when attributes change. For example, how will changes in education change the proportion of who buy the insurance?

DCM – What Can we Learn from the Data?

Application: Health Care Usage (Greene)

German Health Care Usage Data, N = 7,293, Varying Numbers of Periods Data downloaded from Journal of Applied Econometrics Archive. This is an unbalanced panel with 7,293 individuals. This is a large data set. There are altogether 27,326 observations. The number of observations ranges from 1 to 7. (Frequencies are: 1=1525, 2=2158, 3=825, 4=926, 5=1051, 6=1000, 7=987). (Downloaded from the JAE Archive) Variables in the file are DOCTOR HOSPITAL = 1(Number of hospital visits > 0) = 1(Number of doctor visits > 0) HSATDOCVIS = health satisfaction, coded 0 (low) - 10 (high)= number of doctor visits in last three months HOSPVIS = number of hospital visits in last calendar year PUBLICADDON = insured in public health insurance = 1; otherwise = 0= insured by add-on insurance = 1; otherswise = 0 HHNINC = household nominal monthly net income in German marks / 10000 (4 observations with income=0 were dropped). HHKIDS = children under age 16 in the household = 1; otherwise = 0 EDUC AGE = years of schooling= age in years FEMALE = 1 for female headed household, 0 for male EDUC = years of education

Application: Binary Choice Data (Greene)

DCM: Setup – RUM

Random utility maximization (RUM) Assumption: Revealed preference. The decision maker selects the alternative that provides the highest utility. That is, Decision maker n selects choice i if Uni > Unj ∀ j ≠ i

Decomposition of utility: A deterministic (observed), Vnj, and random (unobserved) part, εnj: Unj = Vnj + εnj

The deterministic part, Vnj, is a function of some observed variables, xnj (age, income, sex, price, etc.): Vnj = α + β 1 Agen + β 2 Incomenj + β 3 Sexn + β 4 Pricenj
The random part, εnj, follows a distribution. For example, a normal.

DCM: Setup – RUM

Random utility maximization (RUM)

We think of an individual’s utility as an unobservable variable, with an observable component, V, and an unobservable (tastes?) random component, ε.
The deterministic part is usually intrinsic linear in the parameters. Vnj = α + β 1 Agen + β 2 Incomenj + β 3 Sexn + β 4 Pricenj
In this formulation, the parameters, β, are the same for all individuals. There is no heterogeneity. This is a useful assumption for estimation. It can be relaxed.

Random utility maximization (continuation) Probability Model: Since both U’s are random, the choice is random. Then, n selects i over j if:

=> Pnj = F(X,β) is a CDF.

Vnj - Vnj = h(X, β). h(.) is usually referred as the index function.
To evaluate the CDF, F(X,β), f(εn) needs to be specified.

Pni = (^) ∫ I (ε (^) nj − ε ni < Vni − Vnj ∀ j ≠ i ) f (ε n ) d ε n

Prob ( )

V V j i

P U U j i

nj ni ni nj

ni ni nj nj

ni ni nj

= ε − ε < − ∀ ≠

= + ε > + ε ∀ ≠

= > ∀ ≠

DCM: Setup - RUM

F  h x ( i , β)  = Pr [ yi = 1 ]

DCM: Setup - RUM

h (X, β).

DCM: Setup - RUM

Note: Probit? Logit?

A one standard deviation change in the argument of a standard Normal distribution function is usually called a “Probability Unit” or Probit for short. “Probit” graph papers have a normal probability scales on one axis. The Normal qualitative choice model became known as the Probit model. The “it” was transmitted to the Logistic Model (Logit) and the Gompertz Model (Gompit).

DCM: Setup - Distributions

Many candidates for CDF –i.e., Pn(x’nβ) = F(Zn),:
- Normal (Probit Model) = Φ(Zn)
- Logistic (Logit Model) = 1/[1+exp(-Zn)]
- Gompertz (Gompit Model) = 1 – exp[-exp(Zn)]
Suppose we have binary (0,1) data. Assume β > 0.
Probit Model: Prob(yn=1) approaches 1 very rapidly as X and therefore Z increase. It approaches 0 very rapidly as X and Z decrease
Logit Model: It approaches the limits 0 and 1 more slowly than does the Probit.
Gompit Model: Its distribution is strongly negatively skewed, approaching 0 very slowly for small values of Z, and 1 even more rapidly than the Probit for large values of Z.

DCM: Setup - Distributions

Comparisons: Probit vs Logit

Note: Not all the parameters may be identified.

Suppose we are interested in whether an agent chooses to visit a doctor or not –i.e., (0,1) data. If Uvisit > 0, an agent visits a doctor. Uvisit>0 ⇔ α + β 1 Age + β 2 Income + β 3 Sex + ε > 0 => ε > -(α + β 1 Age+ β 2 Income+ β 3 Sex) Let Y = 1 if Uvisit > 0 Var[ε] = σ^2
Now, divide everything by σ. Uvisit>0 ⇔ ε/σ> -[α/σ + (β 1 /σ) Age+ (β 2 /σ)Income+ (β 3 /σ) Sex] > 0 or w > -[α’ + β 1 ’Age + β’Income + β’Sex] > 0

DCM: Setup - Normalization

Example (from Train (2002)): Suppose there are two types of individuals, a and b, equally represented in the population, with V a = β ′ xa V b = β ′ xb then

[ ]

a^ Pr^ i^1 a a

P y x F β x

= ′ [^ ]

b^ Pr^ i^1 b b

P y x F β x

=^ ′

but

P = 12 ( Pa + Pb ) ≠ P x ( ) = F [ β ′ x ]

DCM: Setup – Aggregation

F ( ⋅)

V

P a V a

P b

V V b

P

P V ( )

DCM: Setup – Aggregation

In general, will tend to (underestimate) overestimate

when probabilities are (high) low

P V ( (^) ) P

Graph: Average probability (2.1) vs. Probability of the average (2.2)

DCM: Setup - Aggregation

Identification problems a. Only differences in utility matter Choice probabilities do not change when a constant is added to each alternative’s utility Implication: Some parameters cannot be identified/estimated. Alternative-specific constants; coefficients of variables that change over decision makers but not over alternatives. b. Overall scale of utility is irrelevant Choice probabilities do not change when the utility of all alternatives are multiplied by the same factor Implication: Coefficients of different models (data sets) are not directly comparable. Normalization of parameters or variance of error terms is used to deal with identification issues.

DCM: Setup - Identification

Example: Logit Model Suppose we have binary (0,1) data. The logit model follows from: Pn[yn=1|x] = exp(x’nβ)/[1+exp(x’nβ) = F(x’nβ) Pn[yn=0|x] = 1/[1+exp(x’nβ)] = 1 - F(x’nβ)
Likelihood function L(β) = Πn (1- P [yn=1|x,β]) P[yn=1|x,β]
Log likelihood Log L(β) = Σy=0 log[1- F(x’nβ)] + Σy=1 log[F(x’nβ)]
Numerical optimization to get β.
The usual problems with numerical optimization apply. The computation of the Hessian, H, may cause problems.
ML estimators are consistency, asymptotic normal and efficient.

DCM: ML Estimation

How can we estimate the covariance matrix, Σβl? Using the usual conditions, we can use the information matrix:
The NR and BHHH are asymptotically equivalent but in small samples they often provide different covariance estimates for the same model

2 1 β l^1 2 1 β l β β (^) l T 1 β l i^ i (^1) β β (^) l

I n g e n e r a l: E L I β β β

N e w t o n - R a p h s o n : L β β

B H H H : L^ L i β^ β

− −

−

= − = (^) =

Σ = ^ − ∂  =  (^) ∂ ∂ ′  

Σ = ^ − ∂   (^) ∂ ∂ ′  

Σ = ^ ∑ ∂^ ∂   (^) ∂ ∂ ′

DCM: ML Estimation – Covariance Matrix

DCM: ML Estimation

Numerical optimization - Steps: (1) Start by specifying the likelihood for one observation: Fn(X,β) (2) Get the joint likelihood function: L(β) = Πn Fn(X,β) (3) It is easier to work with the log likelihood function: Log L(β) = Σ (^) n ln(Fn(X,β)) (4) Maximize Log L(β) with respect to β
Set the score equal to 0 ⇒ no closed-form solution
Numerical optimization, as usual: (i) Starting values β 0 (ii) Determine new value βt+1 = βt + update, such that Log L(βt+1) > Log L(βt). Say, N-R’s updating step: βt+1 = βt - λt H-1^ ∇f (βt) (iii) Repeat step (ii) until convergence.

DCM: Bayesian Estimation

The Bayesian estimator will be the mean of the posterior density:
f(β,γ) is the prior density for the model parameters
f( y|X,β,γ) is the likelihood.
As usual we need to specify the prior and the likelihood:
- The priors are usually non-informative (flat), say f(β,γ) α 1.
The likelihood depends on the model in mind. For a Probit Model, we will use a normal distribution. If we have binary data, then,

f( y|X,β,γ) = Πn (1-Φ[yn|x,β,γ]) Φ[yn|x,β,γ]

∫ γ γ γ

= γ γ γ

γ = γ γ f f d d

f f f

f f f y|X, β β β

y|X, β β y|X β

β |yX y|X, β^ β ( , ) ( , )

( , ) ( , ) ( , , )

( , , ) ( , ) (, )

Lecture 3 Discrete Choice Models, Slides of Microeconomics

Related documents

Partial preview of the text

Download Lecture 3 Discrete Choice Models and more Slides Microeconomics in PDF only on Docsity!

Lecture 3

Discrete Choice Models

Limited Dependent Variables

Limited Dependent Variables

Limdep: Discrete Choice Models (DCM)

Limdep: Truncated/Censored Models

Limdep: Censored Health Satisfaction Data

Limdep: Duration/Hazard Models

Microeconomics behind Discrete Choice

* ( , I) /

( , I) / I

Discrete Choice Models (DCM)

DCM – What Can we Learn from the Data?

Application: Health Care Usage (Greene)

Application: Binary Choice Data (Greene)

DCM: Setup – RUM

DCM: Setup – RUM

DCM: Setup - RUM

F  h x ( i , β)  = Pr [ yi = 1 ]

DCM: Setup - RUM

DCM: Setup - RUM

DCM: Setup - Distributions

DCM: Setup - Distributions

DCM: Setup - Normalization

[ ]

= ′ [^ ]

=^ ′

P = 12 ( Pa + Pb ) ≠ P x ( ) = F [ β ′ x ]

DCM: Setup – Aggregation

F ( ⋅)

V

P

DCM: Setup – Aggregation

DCM: Setup - Aggregation

DCM: Setup - Identification

DCM: ML Estimation

DCM: ML Estimation – Covariance Matrix

DCM: ML Estimation

DCM: Bayesian Estimation