Multivariate and Conditional Normal Model in Econometrics, Study notes of Economics

Lecture notes for economics 522a, spring 2007, covering the multivariate and conditional normal model, with a focus on the bivariate normal model, conditional mean of y, ols coefficients, and conditional modeling. It includes mathematical formulas and explanations for estimating parameters and their properties.

Typology: Study notes

Pre 2010

Uploaded on 08/26/2009

koofers-user-g6a
koofers-user-g6a 🇺🇸

10 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Economics 522A, Spring 2007
Lecture Note 6: Multivariate and Conditional Normal Model
1 Bivariate Normal Model
Suppose we want to develop a stastistical model for the CPS data examined in the previous
lecture note. To simplify things, let us suppose that we first start by modeling the joint
distribution of (yi, xi), where yiis the log of wage, and xiis experience. Here is a scatterplot:
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 10 20 30 40 50 60
Log Wage
Experience
line 1
Taking the log of wage has made the wage distribution a little more symmetric, and apart
from the low experience observations, the data points look roughly elliptical. So, although
the plot doesn’t perfectly justify it, we might start by assuming that (yi, xi) are jointly
multivariate normal:
zi:= yi
xii.i.d.
N(µ, Σ), i = 1, . . . , n. (1)
Here nis the sample size (in the CPS data, n= 1289.) The mean vector µand variance
matrix Σ can be written as
µ=µy
µx,Σ = σyy σxy
σxy σxx ,
where µyis the marginal mean of yi,σyy is the marginal variance of yi,σxy is the covariance
between yiand xi, etc.
1
pf3
pf4
pf5
pf8

Partial preview of the text

Download Multivariate and Conditional Normal Model in Econometrics and more Study notes Economics in PDF only on Docsity!

Economics 522A, Spring 2007

Lecture Note 6: Multivariate and Conditional Normal Model

1 Bivariate Normal Model

Suppose we want to develop a stastistical model for the CPS data examined in the previous

lecture note. To simplify things, let us suppose that we first start by modeling the joint

distribution of (y i

, x i

), where y i

is the log of wage, and x i

is experience. Here is a scatterplot:

-0.

0

1

2

3

4

0 10 20 30 40 50 60

Log Wage

Experience

line 1

Taking the log of wage has made the wage distribution a little more symmetric, and apart

from the low experience observations, the data points look roughly elliptical. So, although

the plot doesn’t perfectly justify it, we might start by assuming that (y i

, x i

) are jointly

multivariate normal:

zi :=

y i

x i

i.i.d.

∼ N (μ, Σ), i = 1,... , n. (1)

Here n is the sample size (in the CPS data, n = 1289.) The mean vector μ and variance

matrix Σ can be written as

μ =

μ y

μ x

σ yy

σ xy

σ xy

σ xx

where μ y

is the marginal mean of y i

, σ yy

is the marginal variance of y i

, σ xy

is the covariance

between y i

and x i

, etc.

Please remind yourself of the properties of the multivariate normal model, given in LN

Addendum from Econ 520, and Ruud Ch. 10.5.1.

The density of z i

is:

f (z i

; μ, Σ) = det(2πΣ)

− 1 / 2

exp

(z i

− μ)

Σ

− 1

(z i

− μ)

So the joint likelihood is:

L(μ, Σ) = f (z 1

,... , z n

; μ, Σ)

n ∏

i=

f (z i

; μ, Σ)

= det(2πΣ)

−n/ 2 exp

n ∑

i=

(z i

− μ)

′ Σ

− 1 (z i

− μ)

The MLE can be solved analytically to get:

μˆ = ¯z =

n

i

z i

1

n

i

y i

1

n

i

x i

x ¯

n

i

(z i

− μˆ)(z i

− μˆ)

1

n

i

(y i

− y¯)

2 1

n

i

(y i

− ¯y)(x i

− x¯)

1

n

i

(yi − ¯y)(xi − ¯x)

1

n

i

(xi − x¯)

2

This is also the method of moments estimator, since we are basically just replacing popu-

lation expectations with sample averages. Using the CPS data we get:

μˆ =

2 Conditional Mean of y

Having estimated the joint distribution of (x i

, y i

), we might want to focus on certain aspects

of that joint distribution, such as the conditional mean of yi given xi. Using standard results

for the multivariate normal distribution:

y i

|x i

∼ N

μ y

σ xy

σ xx

(x i

− μ x

), σ yy

− (σ xy

2

/σ xx

Let

β 1

= μ y

σxy

σ xx

μ x

β 2

σxy

σxx

σ

2

= σ yy

− (σ xy

2

/σ xx

The first order conditions for a minimum are:

i

(y i

− β 1

− β 2

x i

i

(y i

− β 1

− β 2

x i

) · x i

So the OLS coefficients

β 1 ,

β 2 satisfy what are sometimes called the “OLS Normal Equa-

tions”:

i

(yi −

β 1 −

β 2 xi) = 0; (2)

i

(y i

β 1

β 2

x i

) · x i

The first equation can be rearranged to get:

i

y i

− n

β 1

β 2

i

x i

β 1

= ¯y −

β 2

x.¯

Plug this into the second normal equation:

i

yixi − (¯y −

β 2 x¯)

i

xi −

β 2

i

x

2

i

i

yixi −

i

¯yxi =

β 2

[

i

x

2

i

− ¯x

i

xi

]

n

i

xi(yi − ¯y) =

β 2

[

n

i

x

2

i

− (¯x)

2

]

β 2 =

1

n

i

xi(yi − y¯)

1

n

i

x

2

i

− (¯x)

2

A bit of algebra shows that this is equal to:

β 2

1

n

i

(x i

− ¯x)(y i

− y¯)

1

n

i

(x i

− x¯)

2

So the OLS coefficients are identical to the MLE estimates.

4 Conditional Modeling

Recall that we started by assuming joint normality for (y i

, x i

). A nice feature of the mul-

tivariate normal distribution is that the marginal distribution of x i

is normal, and the

conditional distribution of y i

|x i

is normal. We focused on the parameters of the conditional

distribution of yi|xi, and showed that MLE gave the same result as OLS.

Suppose we only make the assumption that y i

is conditionally normally distributed:

y i

|x i

∼ N (β 1

  • β 2

x i

, σ

2

). (4)

We allow the distribution of xi to be arbitrary—it could be nonnormal, discrete, even

degenerate. In order to be precise about some of our later arguments, we will assume that

the above conditional distribution holds conditional on all the x i

s:

yi|x 1 ,... , xn ∼ N (β 1 + β 2 xi, σ

2

),

and that conditional on all the xi, the yi are independent.

1 Then we can write the joint

conditional density of the ys given the xs as:

f (y 1 ,... , yn|x 1 ,... , xn; β 1 , β 2 , σ

2

) =

n ∏

i=

f (yi|xi; β 1 , β 2 , σ

2

)

= (2πσ

2 )

−n/ 2 exp

2 σ

2

i

(y i

− β 1

− β 2

x i

2

In conditional maximum likelihood, we treat this conditional density as the (conditional)

likelihood, and maximize it with respect to the parameters:

max

β 1 ,β 2 ,σ

2

(2πσ

2

)

−n/ 2

exp

2 σ

2

i

(yi − β 1 − β 2 xi)

2

Equivalently, maximize the log of the conditional likelihood:

max

β 1 ,β 2 ,σ

2

n

log(2πσ

2

) −

2 σ

2

i

(yi − β 1 − β 2 xi)

2

.

First order conditions:

2 σ

2

i

(y i

− β 1

− β 2

x i

2 σ

2

i

(y i

− β 1

− β 2

x i

) · x i

n

2 πσ

2

(2π) −

[

σ

4

]

i

(y i

− β 1

− β 2

x i

2

= 0

Simplify a bit to get:

i

(y i

β 1

β 2

x i

i

(yi −

β 1 −

β 2 xi)xi = 0

n

2ˆσ

2

2ˆσ

4

i

(y i

β 1

β 2

2 = 0

1 Suppose we assume that the (yi, xi) are IID, and that the conditional model in (4) holds. Then these

further assumptions will hold.

And

E

[

i

(yi − y¯)(xi − x¯)|x 1 ,... , xn

]

= β 2

i

(xi − x¯)

2

  • E

[

i

(i − ¯)(xi − ¯x) | x 1 ,... , xn

]

= β 2

i

(x i

− x¯)

2 .

Therefore,

E[

β 2

|x 1

,... , x n

] = β 2

So

β 2

is conditionally unbiased. Also, by the law of iterated expectations,

E[

β 2 ] = E

[

E[

β 2 |x 1 ,... , xn]

]

= β 2.

By similar arguments, we can show that

E[

β 1

|x 1

,... , x n

] = β 1

6 Conditional vs. Joint Modeling

We are going to work with conditional models for a little while, so it is worth stopping

to think about the general relationship between, say, unconditional MLE and conditional

MLE.

Return to the joint normal model given in (1). We decomposed the model into a marginal

model for x i

x i

i.i.d.

∼ N (μ x

, σ xx

and a conditional model for y i

given x i

yi|xi ∼ N (β 1 + β 2 xi, σ

2

).

Note that there is a 1-1 mapping between the original parameters (μx, μy, σxx, σxy, σyy) and

the parameters (μ x

, σ xx

, β 1

, β 2

, σ

2 ). Under the reparametrizion, we have a set of parameters

related to the marginal distribution of x i

: θ 1

= (μ x

, σ xx

), and a set of parameters for the

conditional distribution: θ 2

= (β 1

, β 2

, σ

2 ).

So, generalizing a bit, we have a joint model and a marginal-conditional decomposition:

f (x i

, y i

; θ 1

, θ 2

) = f (x i

; θ 1

)f (y i

|x; θ 2

The joint likelihood can be written

f (x 1 , y 1 ,... , xn, yn; θ 1 , θ 2 ) =

n ∏

i=

f (xi, yi; θ 1 , θ 2 )

n ∏

i=

f (x i

; θ 1

)f (y i

|x i

; θ 2

n ∏

i=

f (x i

; θ 1

) ×

n ∏

i=

f (y i

|x i

; θ 2

The joint MLE solves:

max

(θ 1 ,θ 2 )∈Θ

f (x 1

, y 1

,... , x n

, y n

; θ 1

, θ 2

n ∏

i=

f (x i

; θ 1

) ×

n ∏

i=

f (y i

|x i

; θ 2

where Θ is the joint parameter space for (θ 1 , θ 2 ). The conditional MLE solves:

max

θ 2 ∈Θ 2

n ∏

i=

f (y i

|x i

; θ 2

where Θ 2

is the parameter space for θ 2

If θ 1 only enters the marginal density of xi, and θ 2 only enters the conditional density of

y i

|x i

, and the joint parameter space is a Cartesian product:

Θ = Θ 1 × Θ 2 ,

then conditional MLE will give the same result for

θ 2

as unconditional MLE.