




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Lecture notes for economics 522a, spring 2007, covering the multivariate and conditional normal model, with a focus on the bivariate normal model, conditional mean of y, ols coefficients, and conditional modeling. It includes mathematical formulas and explanations for estimating parameters and their properties.
Typology: Study notes
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Economics 522A, Spring 2007
Lecture Note 6: Multivariate and Conditional Normal Model
Suppose we want to develop a stastistical model for the CPS data examined in the previous
lecture note. To simplify things, let us suppose that we first start by modeling the joint
distribution of (y i
, x i
), where y i
is the log of wage, and x i
is experience. Here is a scatterplot:
-0.
0
1
2
3
4
0 10 20 30 40 50 60
Log Wage
Experience
line 1
Taking the log of wage has made the wage distribution a little more symmetric, and apart
from the low experience observations, the data points look roughly elliptical. So, although
the plot doesn’t perfectly justify it, we might start by assuming that (y i
, x i
) are jointly
multivariate normal:
zi :=
y i
x i
i.i.d.
∼ N (μ, Σ), i = 1,... , n. (1)
Here n is the sample size (in the CPS data, n = 1289.) The mean vector μ and variance
matrix Σ can be written as
μ =
μ y
μ x
σ yy
σ xy
σ xy
σ xx
where μ y
is the marginal mean of y i
, σ yy
is the marginal variance of y i
, σ xy
is the covariance
between y i
and x i
, etc.
Please remind yourself of the properties of the multivariate normal model, given in LN
Addendum from Econ 520, and Ruud Ch. 10.5.1.
The density of z i
is:
f (z i
; μ, Σ) = det(2πΣ)
− 1 / 2
exp
(z i
− μ)
′
Σ
− 1
(z i
− μ)
So the joint likelihood is:
L(μ, Σ) = f (z 1
,... , z n
; μ, Σ)
n ∏
i=
f (z i
; μ, Σ)
= det(2πΣ)
−n/ 2 exp
n ∑
i=
(z i
− μ)
′ Σ
− 1 (z i
− μ)
The MLE can be solved analytically to get:
μˆ = ¯z =
n
i
z i
1
n
i
y i
1
n
i
x i
y¯
x ¯
n
i
(z i
− μˆ)(z i
− μˆ)
′
1
n
i
(y i
− y¯)
2 1
n
i
(y i
− ¯y)(x i
− x¯)
1
n
i
(yi − ¯y)(xi − ¯x)
1
n
i
(xi − x¯)
2
This is also the method of moments estimator, since we are basically just replacing popu-
lation expectations with sample averages. Using the CPS data we get:
μˆ =
Having estimated the joint distribution of (x i
, y i
), we might want to focus on certain aspects
of that joint distribution, such as the conditional mean of yi given xi. Using standard results
for the multivariate normal distribution:
y i
|x i
μ y
σ xy
σ xx
(x i
− μ x
), σ yy
− (σ xy
2
/σ xx
Let
β 1
= μ y
σxy
σ xx
μ x
β 2
σxy
σxx
σ
2
= σ yy
− (σ xy
2
/σ xx
The first order conditions for a minimum are:
i
(y i
− β 1
− β 2
x i
i
(y i
− β 1
− β 2
x i
) · x i
So the OLS coefficients
β 1 ,
β 2 satisfy what are sometimes called the “OLS Normal Equa-
tions”:
i
(yi −
β 1 −
β 2 xi) = 0; (2)
i
(y i
β 1
β 2
x i
) · x i
The first equation can be rearranged to get:
i
y i
− n
β 1
β 2
i
x i
β 1
= ¯y −
β 2
x.¯
Plug this into the second normal equation:
i
yixi − (¯y −
β 2 x¯)
i
xi −
β 2
i
x
2
i
i
yixi −
i
¯yxi =
β 2
i
x
2
i
− ¯x
i
xi
n
i
xi(yi − ¯y) =
β 2
n
i
x
2
i
− (¯x)
2
β 2 =
1
n
i
xi(yi − y¯)
1
n
i
x
2
i
− (¯x)
2
A bit of algebra shows that this is equal to:
β 2
1
n
i
(x i
− ¯x)(y i
− y¯)
1
n
i
(x i
− x¯)
2
So the OLS coefficients are identical to the MLE estimates.
Recall that we started by assuming joint normality for (y i
, x i
). A nice feature of the mul-
tivariate normal distribution is that the marginal distribution of x i
is normal, and the
conditional distribution of y i
|x i
is normal. We focused on the parameters of the conditional
distribution of yi|xi, and showed that MLE gave the same result as OLS.
Suppose we only make the assumption that y i
is conditionally normally distributed:
y i
|x i
∼ N (β 1
x i
, σ
2
). (4)
We allow the distribution of xi to be arbitrary—it could be nonnormal, discrete, even
degenerate. In order to be precise about some of our later arguments, we will assume that
the above conditional distribution holds conditional on all the x i
s:
yi|x 1 ,... , xn ∼ N (β 1 + β 2 xi, σ
2
),
and that conditional on all the xi, the yi are independent.
1 Then we can write the joint
conditional density of the ys given the xs as:
f (y 1 ,... , yn|x 1 ,... , xn; β 1 , β 2 , σ
2
) =
n ∏
i=
f (yi|xi; β 1 , β 2 , σ
2
)
= (2πσ
2 )
−n/ 2 exp
2 σ
2
i
(y i
− β 1
− β 2
x i
2
In conditional maximum likelihood, we treat this conditional density as the (conditional)
likelihood, and maximize it with respect to the parameters:
max
β 1 ,β 2 ,σ
2
(2πσ
2
)
−n/ 2
exp
2 σ
2
i
(yi − β 1 − β 2 xi)
2
Equivalently, maximize the log of the conditional likelihood:
max
β 1 ,β 2 ,σ
2
n
log(2πσ
2
) −
2 σ
2
i
(yi − β 1 − β 2 xi)
2
.
First order conditions:
2 σ
2
i
(y i
− β 1
− β 2
x i
2 σ
2
i
(y i
− β 1
− β 2
x i
) · x i
n
2 πσ
2
(2π) −
σ
4
i
(y i
− β 1
− β 2
x i
2
= 0
Simplify a bit to get:
i
(y i
β 1
β 2
x i
i
(yi −
β 1 −
β 2 xi)xi = 0
n
2ˆσ
2
2ˆσ
4
i
(y i
β 1
β 2
2 = 0
1 Suppose we assume that the (yi, xi) are IID, and that the conditional model in (4) holds. Then these
further assumptions will hold.
And
i
(yi − y¯)(xi − x¯)|x 1 ,... , xn
= β 2
i
(xi − x¯)
2
i
(i − ¯)(xi − ¯x) | x 1 ,... , xn
= β 2
i
(x i
− x¯)
2 .
Therefore,
β 2
|x 1
,... , x n
] = β 2
So
β 2
is conditionally unbiased. Also, by the law of iterated expectations,
β 2 ] = E
β 2 |x 1 ,... , xn]
= β 2.
By similar arguments, we can show that
β 1
|x 1
,... , x n
] = β 1
We are going to work with conditional models for a little while, so it is worth stopping
to think about the general relationship between, say, unconditional MLE and conditional
Return to the joint normal model given in (1). We decomposed the model into a marginal
model for x i
x i
i.i.d.
∼ N (μ x
, σ xx
and a conditional model for y i
given x i
yi|xi ∼ N (β 1 + β 2 xi, σ
2
).
Note that there is a 1-1 mapping between the original parameters (μx, μy, σxx, σxy, σyy) and
the parameters (μ x
, σ xx
, β 1
, β 2
, σ
2 ). Under the reparametrizion, we have a set of parameters
related to the marginal distribution of x i
: θ 1
= (μ x
, σ xx
), and a set of parameters for the
conditional distribution: θ 2
= (β 1
, β 2
, σ
2 ).
So, generalizing a bit, we have a joint model and a marginal-conditional decomposition:
f (x i
, y i
; θ 1
, θ 2
) = f (x i
; θ 1
)f (y i
|x; θ 2
The joint likelihood can be written
f (x 1 , y 1 ,... , xn, yn; θ 1 , θ 2 ) =
n ∏
i=
f (xi, yi; θ 1 , θ 2 )
n ∏
i=
f (x i
; θ 1
)f (y i
|x i
; θ 2
n ∏
i=
f (x i
; θ 1
n ∏
i=
f (y i
|x i
; θ 2
The joint MLE solves:
max
(θ 1 ,θ 2 )∈Θ
f (x 1
, y 1
,... , x n
, y n
; θ 1
, θ 2
n ∏
i=
f (x i
; θ 1
n ∏
i=
f (y i
|x i
; θ 2
where Θ is the joint parameter space for (θ 1 , θ 2 ). The conditional MLE solves:
max
θ 2 ∈Θ 2
n ∏
i=
f (y i
|x i
; θ 2
where Θ 2
is the parameter space for θ 2
If θ 1 only enters the marginal density of xi, and θ 2 only enters the conditional density of
y i
|x i
, and the joint parameter space is a Cartesian product:
then conditional MLE will give the same result for
θ 2
as unconditional MLE.