5 – Moment Generating Functions and Multivariate Normal ..., Exercises of Statistics

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution. 1 / 34. Moment Generating Function. Definition Let X = (X1,...,Xn)T be a random vector and.

Typology: Exercises

2022/2023

Uploaded on 02/28/2023

loche
loche 🇺🇸

4.3

(16)

241 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT/MTHE 353: 5 Moment Generating
Functions and Multivariate Normal Distribution
T. Linder
Queen’s University
Winter 2017
STAT/MTHE353: 5 MGF & Multivariate Normal Distribution 1/34
Moment Generating Function
Definition Let X=(X1,...,X
n)Tbe a random vector and
t=(t1,...,t
n)T2Rn.Themoment generating function (MGF) is
defined by
MX(t)=EetTX
for all tfor which the expectation exists (i.e., finite).
Remarks:
MX(t)=EePn
i=1 tiXi
For 0=(0,...,0)T,wehaveMX(0)=1.
If Xis a discrete random variable with finitely many values, then
MX(t)=EetTXis always finite for all t2Rn.
We will always assume that the distribution of Xis such that
MX(t)is finite for all t2(t0,t
0)nfor some t0>0.
STAT/MTHE353: 5 MGF & Multivariate Normal Distribution 2/34
The single most important property of the MGF is that is uniquely
determines the distribution of a random vector:
Theorem 1
Assume MX(t)and MY(t)are the MGFs of the random vectors Xand
Yand such that MX(t)=MY(t)for all t2(t0,t
0)n. Then
FX(z)=FY(z)for all z2Rn
where FXand FYare the joint cdfs of Xand Y.
Remarks:
FX(z)=FY(z)for all z2Rnclearly implies MX(t)=MY(t).
Thus MX(t)=MY(t)() FX(z)=FY(z)
Most often we will use the theorem for random variables instead of
random vectors. In this case, MX(t)=MY(t)for all t2(t0,t
0)
implies FX(z)=FY(z)for all z2R.
STAT/MTHE353: 5 MGF & Multivariate Normal Distribution 3/34
Connection with moments
Let k1,...,k
nbe nonnegative integers and k=k1+···+kn.Then
@k
@tk1
1···@tkn
n
MX(t)= @k
@tk1
1···@tkn
n
Eet1X1+···+tnXn
=E@k
@tk1
1···@tkn
n
et1X1+···+tnXn
=EXk1
1···Xkn
net1X1+···+tnXn
Setting t=0=(0,...,0)T, we get
@k
@tk1
1···@tkn
n
MX(t)t=0=EXk1
1···Xkn
n
For a (scalar) random variable Xwe obtain the kth moment of X:
dk
dtkMX(t)t=0 =EXk
STAT/MTHE353: 5 MGF & Multivariate Normal Distribution 4/34
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download 5 – Moment Generating Functions and Multivariate Normal ... and more Exercises Statistics in PDF only on Docsity!

STAT/MTHE 353: 5 – Moment Generating

Functions and Multivariate Normal Distribution

T. Linder

Queen’s University

Winter 2017

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 1 / 34

Moment Generating Function

Definition Let X = (X 1 ,... , Xn)T^ be a random vector and

t = (t 1 ,... , tn) T 2 R n

. The moment generating function (MGF) is

defined by

MX (t) = E

e tT^ X

for all t for which the expectation exists (i.e., finite).

Remarks:

MX (t) = E

e

Pn i=1 tiXi

For 0 = (0,... , 0) T , we have MX ( 0 ) = 1.

If X is a discrete random variable with finitely many values, then

MX (t) = E

e tT^ X

is always finite for all t 2 R n .

We will always assume that the distribution of X is such that

MX (t) is finite for all t 2 (t 0 , t 0 )n^ for some t 0 > 0.

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 2 / 34

The single most important property of the MGF is that is uniquely

determines the distribution of a random vector:

Theorem 1

Assume MX (t) and MY (t) are the MGFs of the random vectors X and

Y and such that MX (t) = MY (t) for all t 2 (t 0 , t 0 )n. Then

FX (z) = FY (z) for all z 2 R n

where FX and FY are the joint cdfs of X and Y.

Remarks:

FX (z) = FY (z) for all z 2 Rn^ clearly implies MX (t) = MY (t).

Thus MX (t) = MY (t) () FX (z) = FY (z)

Most often we will use the theorem for random variables instead of

random vectors. In this case, MX (t) = MY (t) for all t 2 (t 0 , t 0 )

implies FX (z) = FY (z) for all z 2 R.

Connection with moments

Let k 1 ,... , kn be nonnegative integers and k = k 1 + · · · + kn. Then

@k

@t k 1 1 · · ·^ @t

kn n

MX (t) =

@k

@t k 1 1 · · ·^ @t

kn n

E

e t 1 X 1 +···+tnXn

= E

@k

@t k 1 1 · · ·^ @t

kn n

e t 1 X 1 +···+tnXn

= E

X

k 1 1 · · ·^ X

kn n

e t 1 X 1 +···+tnXn

Setting t = 0 = (0,... , 0) T , we get

@k

@t k 1 1 · · ·^ @t

kn n

MX (t)

t= 0

= E

X

k 1 1 · · ·^ X

kn n

For a (scalar) random variable X we obtain the kth moment of X:

d k

dtk^

MX (t)

t=

= E

X

k

Theorem 2

Assume X 1 ,... , Xm are independent random vectors in R n and let

X = X 1 + · · · + Xm. Then

MX (t) =

Y^ m

i=

MXi (t)

Proof:

MX (t) = E

e tT^ X

= E

e tT^ (X 1 +···+Xm)

= E

e tT^ X 1 · · · e tT^ Xm

= E

e

tT^ X 1 ^ · · · E

e

tT^ Xm

= MX 1 (t) · · · MXm (t) ⇤

Note: This theorem gives us a powerful tool for determining the

distribution of the sum of independent random variables.

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 5 / 34

Example: MGF for X ⇠ Gamma(r, ) and X 1 + · · · + Xm where the Xi

are independent and Xi ⇠ Gamma(ri, ).

Example: MGF for X ⇠ Poisson() and X 1 + · · · + Xm where the Xi

are independent and Xi ⇠ Gamma(i). Also, use the MGF to find

E(X), E(X 2 ), and Var(X).

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 6 / 34

Theorem 3

Assume X is a random vector in R n , A is an m ⇥ n real matrix and

b 2 Rm. Then the MGF of Y = AX + b is given at t 2 Rm^ by

MY (t) = e tT^ b MX (A

T t)

Proof:

MY (t) = E

e tT^ Y

= E

e tT^ (AX+b)

= e tT^ b E

e tT AX

= e tT^ b E

e (AT^ t)T^ X

= e tT^ b MX (A T t) ⇤

Note: In the scalar case Y = aX + b we obtain

MY (t) = e tb MX (at)

Applications to Normal Distribution

Let X ⇠ N (0, 1). Then

MX (t) = E(e

tX ) =

Z 1

e

tx 1 p 2 ⇡

e

x^2 / 2 dx

Z 1

p 2 ⇡

e

12 (x^2 2 tx) dx =

Z 1

p 2 ⇡

e

(^12)

(xt)^2 t^2

dx

= e

t^2 / 2

Z 1

p 2 ⇡

e

12 (xt)^2

| {z } N (t,1) pdf

dx

= e t^2 / 2

We obtain that for all t 2 R

MX (t) = e t^2 / 2

(4) C has a unique nonnegative definite square root C

1 / 2 , i.e., there

exists a unique nonnegative definite A such that

C = AA

Proof: We only prove the existence of A. Let

D 1 / 2 = diag(

1 / 2 1 ,... ,^

1 / 2 n )^ and note that^ D

1 / 2 D 1 / 2 = D. Let

A = BD

1 / 2 B

T

. Then A is nonnegative definite and

A

2 = AA = (BD

1 / 2 B

T )(BD

1 / 2 B

T )

= BD

1 / 2 B

T BD

1 / 2 B

T = BD

1 / 2 D

1 / 2 B

T

= C ⇤

Remarks:

If C is positive definite, then so is A.

If we don’t require that A be nonnegative definite, then in

general there are infinitely many solutions A for AA T = C.

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 13 / 34

Lemma 4

If ⌃ is the covariance matrix of some random vector

X = (X 1 ,... , Xn) T , then it is nonnegative definite.

Proof: We know that ⌃ = Cov(X) is symmetric. Let b 2 Rn^ be

arbitrary. Then

b

T ⌃b = b

T Cov(X)b = Cov(b

T X) = Var(b

T X) 0

so ⌃ is nonnegative definite

Remark: It can be shown that an n ⇥ n matrix ⌃ is nonnegative definite

if and only if there exists a random vector X = (X 1 ,... , Xn) T such that

Cov(X) = ⌃.

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 14 / 34

Defining the Multivariate Normal Distribution

Let Z 1 ,... , Zn be independent r.v.’s with Zi ⇠ N (0, 1). The

multivariate MGF of Z = (Z 1 ,... , Zn) T is

MZ (t) = E

e tTZ

= E

e

Pn i=1 tiZi^

Y^ n

i=

E

e tiZi

Y^ n

i=

e t^2 i / 2 = e

Pn i=1 t

2 i /^2 = e

1 2 t

Tt

Now let μ 2 R n and A an n ⇥ n real matrix. Then the MGF of

X = AZ + μ is

MX (t) = e t T μ MZ (A

T t) = e t T μ e

1 2 (A

T t) T (A T t)

= e tT^ μ e

1 2 t

T (^) AAT (^) t = e tT^ μ+ 12 tT^ ⌃t

where ⌃ = AA T

. Note that ⌃ is nonnegative definite.

Definition Let μ 2 Rn^ and let ⌃ be an n ⇥ n nonnegative definite

matrix. A random vector X = (X 1 ,... , Xn) is said to have a

multivariate normal distribution with parameters μ and ⌃ if its

multivariate MGF is

MX (t) = e tT^ μ+ 12 tT^ ⌃t

Notation: X ⇠ N (μ, ⌃).

Remarks:

If Z = (Z 1 ,... , Zn) T with Zi ⇠ N (0, 1), i = 1,... , n, then

Z ⇠ N ( 0 , I), where I is the n ⇥ n identity matrix.

We saw that if Z ⇠ N ( 0 , I), then X = AZ + μ ⇠ N (μ, ⌃), where

⌃ = AA

T

. One can show the following:

X ⇠ N (μ, ⌃) if and only if X = AZ + μ for a random n-vector

Z ⇠ N ( 0 , I) and some n ⇥ n matrix A with ⌃ = AA

T .

Mean and covariance for multivariate normal distribution

Consider first Z ⇠ N ( 0 , I), i.e., Z = (Z 1 ,... , Zn)T^ , where the Zi are

independent N (0, 1) random variables. Then

E(Z) =

E(Z 1 ),... , E(Zn)

T

T

and

E

(Zi E(Zi))(Zj E(Zj ))

= E(ZiZj ) =

1 if i = j,

0 if i 6 = j

Thus

E(Z) = 0 , Cov(Z) = I

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 17 / 34

If X ⇠ N (μ, ⌃), then X = AZ + μ for a random n-vector

Z ⇠ N ( 0 , I) and some n ⇥ n matrix A with ⌃ = AA

T .

We have

E(AZ + μ) = AE(Z) + μ = μ

Also,

Cov(AZ + μ) = Cov(AZ) = A Cov(Z)A T = AA T = ⌃

Thus

E(X) = μ, Cov(X) = ⌃

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 18 / 34

Joint pdf for multivariate normal distribution

Lemma 5

If a random vector X = (X 1 ,... , Xn)T^ has covariance matrix ⌃ that is

not of full rank (i.e., singular), then X does not have a joint pdf.

Proof sketch: If ⌃ is singular, then there exists b 2 Rn^ such that b 6 = 0

and ⌃b = 0. Consider the random variable b

T X =

Pn i=1 biXi:

Var(b

T X) = Cov(b

T X) = b

T Cov(X)b = b

T ⌃b = 0

Therefore P (b T X = c) = 1 for some constant c. If X had a joint pdf

f (x), then for B = {x : b

T x = c} we should have

1 = P (b

T X = c) = P (X 2 B) =

Z

Z

B

f (x 1 ,... , xn) dx 1 · · · dxn

But this is impossible since B is an (n 1)-dimensional hyperplane

whose n-dimensional volume is zero, so the integral must be zero. ⇤

Theorem 6

If X = (X 1 ,... , Xn) T ⇠ N (μ, ⌃), where ⌃ is nonsingular, then it has a

joint pdf given by

fX (x) =

p (2⇡)n^ det ⌃

e

12 (xμ)T^ ⌃^1 (xμ) , x 2 R

n

Proof: We know that X = AZ + μ where

Z = (Z 1 ,... , Zn) T ⇠ N ( 0 , I) and A is an n ⇥ n matrix such that

AA T = ⌃. Since ⌃ is nonsingular, A must be nonsingular with inverse

A

1

. Thus the mapping

h(z) = Az + μ

is invertible with inverse g(x) = A 1 (x μ) whose Jacobian is

Jg (x) = det A

1

By the multivariate transformation theorem

fX (x) = fZ (g(x))|Jg (x)| = fZ

A

1 (x μ)

| det A 1 |

In general, the following important facts can be proved using the

multivariate MGF:

(i) If X = (X 1 ,... , Xn) T ⇠ N (μ, ⌃), then X 1 , X 2 ,... Xn are

independent if and only if they are uncorrelated, i.e.,

Cov(Xi, Xj ) = 0 if i 6 = j, i.e., ⌃ is a diagonal matrix.

(ii) Assume X = (X 1 ,... , Xn) T ⇠ N (μ, ⌃) and let

X 1 = (X 1 ,... , Xk) T , X 2 = (Xk+1,... , Xn) T

Then X 1 and X 2 are independent if and only if

Cov(X 1 , X 2 ) = (^0) k⇥(nk), the k ⇥ (n k) matrix of zeros, i.e., ⌃

can be partitioned as

⌃ 11 0 k⇥(nk)

(^0) (nk)⇥k ⌃ 22

where ⌃ 11 = Cov(X 1 ) and ⌃ 22 = Cov(X 2 ).

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 25 / 34

Marginals of multivariate normal distributions

Let X = (X 1 ,... , Xn)T^ ⇠ N (μ, ⌃). If A is an m ⇥ n matrix and

b 2 R m , then

Y = AX + b

is a random m-vector. Its MGF at t 2 Rm^ is

MY (t) = e tT^ b MX (A

T t)

Since MX (⌧ ) = e ⌧ T^ μ+ 12 ⌧ T^ ⌃⌧ for all ⌧ 2 R n , we obtain

MY (t) = e tT^ b e (AT^ t)T^ μ+ 12 (AT^ t)T^ ⌃(AT^ t)

= e tT^ (b+Aμ)+ 12 tT^ A⌃AT^ t

This means that Y ⇠ N (b + Aμ, A⌃A T ), i.e., Y is multivariate normal

with mean b + Aμ and covariance A⌃A T .

Example: Let a 1 ,... , an 2 R and determine the distribution of

Y = a 1 X 1 + · · · + anXn.

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 26 / 34

For some 1  m < n let {i 1 ,... , im} ⇢ { 1 ,... , n} such that

i 1 < i 2 < · · · < im. Let ej = (0,... , 0 , 1 , 0 ,... , 0) t be the jth unit

vector in Rn^ and define the m ⇥ n matrix A by

A =

eT i 1 . . .

e T im

Then

AX =

e T i 1 . . .

eT i m

X 1

Xn

Xi 1 . . .

Xim

Thus (Xi 1 ,... , Xim ) T ⇠ N (Aμ, A⌃A T ).

Note the following:

Aμ =

μi 1 . . .

μim

and the (j, k)th entry of A⌃A

T is

(A⌃A

T )jk =

A ⇥ (ikth column of ⌃)

j

= (⌃)ij ik = Cov(Xij , Xik )

Thus if X = (X 1 ,... , Xn)T^ ⇠ N (μ, ⌃), then (Xi 1 ,... , Xim )T^ is mul-

tivariate normal whose mean and covariance are obtained by picking out

the corresponding elements of μ and ⌃.

Special case: For m = 1 we obtain that Xi ⇠ N (μi, 2 i ), where μi = E(Xi) and 2 i = Var(Xi), for all^ i^ = 1,... , n.

Conditional distributions

Let X = (X 1 ,... , Xn) T ⇠ N (μ, ⌃) and for 1  m < n define

X 1 = (X 1 ,... , Xm) T , X 2 = (Xm+1,... , Xn) T

We know that X 1 ⇠ N (μ 1 , ⌃ 11 ) and X 2 ⇠ N (μ 2 , ⌃ 22 ) where

μi = E(Xi), ⌃ii = Cov(Xi), i = 1, 2.

Then μ and ⌃ can be partitioned as

μ =

μ 1

μ 2

where ⌃ij = Cov(Xi, Xj ), i, j = 1, 2. Note that ⌃ 11 is m ⇥ m, ⌃ 22 is

(n m) ⇥ (n m), ⌃ 12 is m ⇥ (n m), and ⌃ 21 is (n m) ⇥ m. Also,

⌃ 21 = ⌃ T

We assume that ⌃ 11 is nonsingular and we want to determine the

conditional distribution of X 2 given X 1 = x 1.

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 29 / 34

Recall that X = AZ + μ for some Z = (Z 1 ,... , Zn) T where the Zi are

independent N (0, 1) random variables and A is such that AA T = ⌃.

Let Z 1 = (Z 1 ,... , Zm)T^ and Z 2 = (Zm+1,... , Zn)T^. We want to

determine such A in a partitioned form with dimensions corresponding to

the partitioning of ⌃:

A =

B (^0) m⇥(nm)

C D

We can write ⌃ = AA T as " ⌃ 11 ⌃ 12

⌃ 21 ⌃ 22

B (^0) m⇥(nm)

C D

B

T C T

(^0) (nm)⇥m D T

BB

T BC T

CB

T CC T

  • DD T

STAT/MTHE 353: 5 – MGF & Multivariate Normal Distribution 30 / 34

We want to solve for B, C and D. First consider BB T = ⌃ 11. We

choose B to be the unique positive definite square root of ⌃ 11 :

B = ⌃

1 / 2 11

Recall that B is symmetric and it is invertible since ⌃ 11 is. Then

⌃ 21 = CB

T = implies

C = ⌃ 21 (B

T ) 1 = ⌃ 21 B 1

Then ⌃ 22 = CC T

  • DD T gives

DD

T = ⌃ 22 CC

T = ⌃ 22 ⌃ 21 B

1 B

1 (⌃ 21 ) T

= ⌃ 22 ⌃ 21 (BB)

1 ⌃ 12 = ⌃ 22 ⌃ 21 ⌃ 1 11 ⌃^12

Now note that X = AZ + μ gives

X 1 = BZ 1 + μ 1 , X 2 = CZ 1 + DZ 2 + μ 2

Since B is invertible, given X 1 = x 1 , we have Z 1 = B 1 (x 1 μ 1 ). So

given X 1 = x 1 , we have that the conditional distribution of X 2 and the

conditional distribution of

CB

1 (x 1 μ 1 ) + DZ 2 + μ 2

are the same.

But Z 2 is independent of X 1 , so given X 1 = x 1 , the conditional

distribution of CB

1 (x 1 μ 1 ) + DZ 2 + μ 2 is the same as its

unconditional distribution.

We conclude that the conditional distribution of X 2 given X 1 = x 1 is

multivariate normal with mean

E(X 2 |X 1 = x 1 ) = μ 2 + CB 1 (x 1 μ 1 )

= μ 2 + ⌃ 21 B

1 B

1 (x 1 μ 1 )

= μ 2 + ⌃ 21 ⌃ 1 11 (x^1 ^ μ 1 )

and covariance matrix ⌃ 22 | 1 = DD T = ⌃ 22 ⌃ 21 ⌃ 1 11 ⌃^12