Lecture 3 — Moment generating functions, multivariate normal ..., Exams of Statistics

One reason for studying the MGF is that it provides a different, and sometimes more convenient, way of describing the distribution of a random variable X.

Typology: Exams

2022/2023

Uploaded on 02/28/2023

leonpan
leonpan 🇺🇸

4

(12)

286 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
S&DS 242/542: Theory of Statistics Spring 2022
Lecture 3 Moment generating functions, multivariate normal
3.1 Moment generating functions
A tool from probability that will be particularly useful for us is the moment generating
function (MGF) of a random variable X. This is a function of tRdefined by
MX(t) = E[etX ].
Depending on the distribution of the random variable X, it is possible for MX(t) to be
for some values of t. Here are two examples:
Example 3.1 (Normal MGF).Suppose X N(0,1). Then
MX(t) = E[etX ] = Z
−∞
etx 1
2πex2
2dx =Z
−∞
1
2πex2+2tx
2dx.
To compute this integral, we complete the square:
Z
−∞
1
2πex2+2tx
2dx =Z
−∞
1
2πex2+2txt2
2+t2
2dx =et2
2Z
−∞
1
2πe(xt)2
2dx.
The integrand in the last integral above is the PDF of the N(t, 1) distribution—hence this
last integral equals 1. So the MGF of Xis simply
MX(t) = et2
2.
Now suppose X N(µ, σ2). This means that Xµ N (0, σ 2), and Xµ
σ N(0,1).
Then we may represent Xas X=µ+σZ where Z N(0,1). The MGF of Xis
MX(t) = E[etX ] = E[eµt+σtZ ] = eµtE[eσ tZ ] = eµtMZ(σt) = eµt+σ2t2
2,
where in the last step we have applied the MGF for Z N(0,1) computed above. In this
normal example, MX(t)<for all tR.
Example 3.2 (Gamma MGF).Suppose XGamma(α, β ) (where α, β > 0). Then
MX(t) = E[etX ] = Z
0
etx βα
Γ(α)xα1eβxdx =βα
Γ(α)Z
0
xα1e(tβ)xdx.
We consider three cases:
If t > β, then the integrand xα1e(tβ)xincreases to infinity as x . So the integral
is , and MX(t) = .
3-1
pf3
pf4
pf5

Partial preview of the text

Download Lecture 3 — Moment generating functions, multivariate normal ... and more Exams Statistics in PDF only on Docsity!

S&DS 242/542: Theory of Statistics Spring 2022

Lecture 3 — Moment generating functions, multivariate normal

3.1 Moment generating functions

A tool from probability that will be particularly useful for us is the moment generating function (MGF) of a random variable X. This is a function of t ∈ R defined by

MX (t) = E[etX^ ].

Depending on the distribution of the random variable X, it is possible for MX (t) to be ∞ for some values of t. Here are two examples:

Example 3.1 (Normal MGF). Suppose X ∼ N (0, 1). Then

MX (t) = E[etX^ ] =

−∞

etx^

2 π

e−^

x 22 dx =

−∞

2 π

e

−x^22 +2 tx dx.

To compute this integral, we complete the square: ∫ (^) ∞

−∞

2 π

e

−x^2 +2tx (^2) dx =

−∞

2 π

e

−x^2 +2tx−t^2 2 +^ t

2 (^2) dx = et

2 2

−∞

2 π

e−^

(x−t)^2 (^2) dx.

The integrand in the last integral above is the PDF of the N (t, 1) distribution—hence this last integral equals 1. So the MGF of X is simply

MX (t) = e

t^2 (^2).

Now suppose X ∼ N (μ, σ^2 ). This means that X − μ ∼ N (0, σ^2 ), and X−σ μ ∼ N (0, 1). Then we may represent X as X = μ + σZ where Z ∼ N (0, 1). The MGF of X is

MX (t) = E[etX^ ] = E[eμt+σtZ^ ] = eμtE[eσtZ^ ] = eμtMZ (σt) = eμt+^

σ^22 t 2 ,

where in the last step we have applied the MGF for Z ∼ N (0, 1) computed above. In this normal example, MX (t) < ∞ for all t ∈ R.

Example 3.2 (Gamma MGF). Suppose X ∼ Gamma(α, β) (where α, β > 0). Then

MX (t) = E[etX^ ] =

0

etx^

βα Γ(α)

xα−^1 e−βxdx =

βα Γ(α)

0

xα−^1 e(t−β)xdx.

We consider three cases:

  • If t > β, then the integrand xα−^1 e(t−β)x^ increases to infinity as x → ∞. So the integral is ∞, and MX (t) = ∞.
  • If t = β, the integral is simply

0 x

α− (^1) dx = 1 α x

α∣∣∞

  1. Since^ α >^ 0, we have limx→∞^ x

α (^) = ∞, so also MX (t) = ∞.

  • If t < β, let us rewrite the above as

MX (t) =

βα (β − t)α

0

(β − t)α Γ(α)

xα−^1 e−(β−t)xdx.

This integrand is now the PDF of the Gamma(α, β − t) distribution (where both α > 0 and β − t > 0 as required). So the integral equals 1, and MX (t) = β

α (β−t)α^.

Combining these cases,

MX (t) =

∞ if t ≥ β βα (β−t)α^ if^ t < β^

∞ if t ≥ β (1 − t/β)−α^ if t < β

One reason for studying the MGF is that it provides a different, and sometimes more convenient, way of describing the distribution of a random variable X. Last lecture, we explained how the distribution of X may be specified by its PMF/PDF, or by its CDF. If the MGF MX (t) is finite for all t in some open interval around 0, as in both of the above examples, then the MGF also uniquely specifies the distribution of X. This is the content of the following theorem (which we will not prove in this class):

Theorem 3.3. Let X and Y be two random variables such that, for some h > 0 and every t ∈ (−h, h), both MX (t) and MY (t) are finite and MX (t) = MY (t). Then X and Y have the same distribution.

For example, this means that if X is any random variable for which MX (t) = e

t^2 (^2) , then the distribution of X must be N (0, 1) by this theorem and our calculation in Example 3.1. The MGF is particularly useful in statistics because we often deal with independent data. If X 1 ,... , Xn are independent random variables, then the MGF of their sum satisfies

MX 1 +...+Xn (t) = E[et(X^1 +...+Xn)] = E[etX^1 ] ·... · E[etXn^ ] = MX 1 (t) ·... · MXn (t).

This is the product of the individual MGFs of X 1 ,... , Xn. In contrast, the PDF or CDF of the sum X 1 +... + Xn may be quite complicated to express and derive. Thus, even though the MGF of a distribution is less intuitive to interpret than the PDF or CDF, it provides a more convenient tool for understanding sums of independent random variables. We will see a couple examples of this below.

3.2 The Multivariate Normal distribution

The Multivariate Normal distribution in dimension k is a joint distribution for k contin- uous random variables (X 1 ,... , Xk) ∈ Rk, which generalizes the normal distribution for a

This is the MGF for the normal distribution N (a 1 μ 1 +... + akμk, a^21 σ 12 +... + a^2 kσ k^2 ). Since the MGF uniquely specifies the distribution, this implies that a 1 X 1 +... + akXk is normally distributed. This is true for all a 1 ,... , ak ∈ R, so (X 1 ,... , Xk) are multivariate normal. The entries of the mean vector μ must be E[Xi] = μi. The entries of the covariance matrix Σ must be given by Σii = Var[Xi] = σ^2 i and Σij = Cov[Xi, Xj ] = 0 for all i 6 = j.

Example 3.7. Suppose Z 1 ,... , Zj are independent normal random variables, and each ran- dom variable X 1 ,... , Xk is a linear combination of Z 1 ,... , Zj :

X 1 = b 11 Z 1 +... + b 1 j Zj X 2 = b 21 Z 1 +... + b 2 j Zj .. . Xk = bk 1 Z 1 +... + bkj Zj

for some constants b 11 ,... , b 1 j ,... , bk 1 ,... , bkj ∈ R. Then (X 1 ,... , Xk) are multivariate normal. This is because any further linear combination a 1 X 1 +... + akXk for a 1 ,... , ak ∈ R may also be written as a linear combination of the original variables Z 1 ,... , Zj , and hence has a normal distribution by the preceding example.

When the covariance matrix Σ is invertible, an alternative way to define the distribution N (μ, Σ) is by the following formula for its joint PDF on Rk:

fX 1 ,...,Xk (x 1 ,... , xk) =

det(2πΣ)

· e−^

(^12) (x 1 −μ 1 , ..., xk −μk )>Σ− (^1) (x 1 −μ 1 , ..., xk −μk ) .

To visualize this PDF, consider the two-dimensional setting k = 2, where we obtain the special case of a Bivariate Normal distribution for two random variables X and Y. In this case, the distribution may be specified by the means μX and μY of X and Y , the variances σ^2 X and σ^2 Y of X and Y , and the correlation ρ between X and Y. (Then Cov[X, Y ] = ρσX σY .) The contours of the joint PDF fX,Y (x, y) are shown below, for correlation ρ = 0 on the left and correlation ρ = 0.75 on the right, when σ^2 X = σ Y^2 = 1 and μX = μY = 0:

312 Introduction to Probability

x − 3 −^2

− 1 0

1 2

y 3

− 3

− 2

− 1

0

1

2

3

x − 3 −^2

− 1 0

1 2

y 3

− 3

− 2

− 1

0

1

2

0.00 3

x

y

− 2

− 1

0

1

2

− 2 − 1 0 1 2 x

y

− 2

− 1

0

1

2

− 2 − 1 0 1 2

FIGURE 7. Joint PDFs of two Bivariate Normal distributions. On the left, X and Y are marginally N (0, 1) and have zero correlation. On the right, X and Y are marginally N (0, 1) and have correlation 0.75.

On the left, for ρ = 0, these contours are circular. The joint PDF has a peak at 0 and decays radially away from 0. On the right, for ρ = 0.75, the contours are ellipses. As ρ increases to 1, the contours concentrate more and more around the line y = x. More generally, the joint PDF of N (μ, Σ) in k dimensions has a single peak at the mean vector μ ∈ Rk, and its contours are ellipsoids around μ whose shape depends on Σ. Recall from Lecture 2 that, in general, Cov[X, Y ] = 0 does not imply that random variables X and Y are independent. However, this implication is true when (X, Y ) are bivariate normal. More generally, we have the following:

Theorem 3.8. Suppose X = (X 1 ,... , Xk) is multivariate normal. Let X 1 and X 2 be disjoint subvectors of X such that each entry of X 1 is uncorrelated with each entry of X 2. Then the vectors X 1 and X 2 are independent.

3.3 Sampling distributions of statistics

For data X 1 ,... , Xn which we will model as random variables, a statistic T (X 1 ,... , Xn) is any real-valued function of the data. In other words, it is any number computed from the data. For example, the sample mean

X¯ =^1

n

(X 1 +... + Xn),

the sample variance

S^2 =

n − 1

(X 1 − X¯)^2 +... + (Xn − X¯)^2

and the range R = max(X 1 ,... , Xn) − min(X 1 ,... , Xn)

are all statistics. Any statistic is itself also a random variable. One use of probability theory in this course will be to understand the distribution of a statistic, called its sampling distribution, based on the distribution of the original data X 1 ,... , Xn. Here, let us discuss two examples. The second example will introduce the important chi-squared distribution, which we will see again in later lectures.

Example 3.9 (Sample mean of IID normals). Suppose X 1 ,... , XnIID ∼ N (μ, σ^2 ). The sample mean X¯ is actually a special case of the quantity a 1 X 1 +... + anXn from Example 3.6, where ai = (^) n^1 , μi = μ, and σ^2 i = σ^2 for all i = 1,... , n. Then from that example,

X¯ ∼ N

μ,

σ^2 n

Example 3.10 (Chi-squared distribution). Suppose X 1 ,... , Xn IID ∼ N (0, 1). Then the distribution of the statistic X 12 +... + X n^2

is called the chi-squared distribution with nnn degree of freedom, abbreviated as χ^2 n.