



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
One reason for studying the MGF is that it provides a different, and sometimes more convenient, way of describing the distribution of a random variable X.
Typology: Exams
1 / 6
This page cannot be seen from the preview
Don't miss anything!




S&DS 242/542: Theory of Statistics Spring 2022
A tool from probability that will be particularly useful for us is the moment generating function (MGF) of a random variable X. This is a function of t ∈ R defined by
MX (t) = E[etX^ ].
Depending on the distribution of the random variable X, it is possible for MX (t) to be ∞ for some values of t. Here are two examples:
Example 3.1 (Normal MGF). Suppose X ∼ N (0, 1). Then
MX (t) = E[etX^ ] =
−∞
etx^
2 π
e−^
x 22 dx =
−∞
2 π
e
−x^22 +2 tx dx.
To compute this integral, we complete the square: ∫ (^) ∞
−∞
2 π
e
−x^2 +2tx (^2) dx =
−∞
2 π
e
−x^2 +2tx−t^2 2 +^ t
2 (^2) dx = et
2 2
−∞
2 π
e−^
(x−t)^2 (^2) dx.
The integrand in the last integral above is the PDF of the N (t, 1) distribution—hence this last integral equals 1. So the MGF of X is simply
MX (t) = e
t^2 (^2).
Now suppose X ∼ N (μ, σ^2 ). This means that X − μ ∼ N (0, σ^2 ), and X−σ μ ∼ N (0, 1). Then we may represent X as X = μ + σZ where Z ∼ N (0, 1). The MGF of X is
MX (t) = E[etX^ ] = E[eμt+σtZ^ ] = eμtE[eσtZ^ ] = eμtMZ (σt) = eμt+^
σ^22 t 2 ,
where in the last step we have applied the MGF for Z ∼ N (0, 1) computed above. In this normal example, MX (t) < ∞ for all t ∈ R.
Example 3.2 (Gamma MGF). Suppose X ∼ Gamma(α, β) (where α, β > 0). Then
MX (t) = E[etX^ ] =
0
etx^
βα Γ(α)
xα−^1 e−βxdx =
βα Γ(α)
0
xα−^1 e(t−β)xdx.
We consider three cases:
0 x
α− (^1) dx = 1 α x
α∣∣∞
α (^) = ∞, so also MX (t) = ∞.
MX (t) =
βα (β − t)α
0
(β − t)α Γ(α)
xα−^1 e−(β−t)xdx.
This integrand is now the PDF of the Gamma(α, β − t) distribution (where both α > 0 and β − t > 0 as required). So the integral equals 1, and MX (t) = β
α (β−t)α^.
Combining these cases,
MX (t) =
∞ if t ≥ β βα (β−t)α^ if^ t < β^
∞ if t ≥ β (1 − t/β)−α^ if t < β
One reason for studying the MGF is that it provides a different, and sometimes more convenient, way of describing the distribution of a random variable X. Last lecture, we explained how the distribution of X may be specified by its PMF/PDF, or by its CDF. If the MGF MX (t) is finite for all t in some open interval around 0, as in both of the above examples, then the MGF also uniquely specifies the distribution of X. This is the content of the following theorem (which we will not prove in this class):
Theorem 3.3. Let X and Y be two random variables such that, for some h > 0 and every t ∈ (−h, h), both MX (t) and MY (t) are finite and MX (t) = MY (t). Then X and Y have the same distribution.
For example, this means that if X is any random variable for which MX (t) = e
t^2 (^2) , then the distribution of X must be N (0, 1) by this theorem and our calculation in Example 3.1. The MGF is particularly useful in statistics because we often deal with independent data. If X 1 ,... , Xn are independent random variables, then the MGF of their sum satisfies
MX 1 +...+Xn (t) = E[et(X^1 +...+Xn)] = E[etX^1 ] ·... · E[etXn^ ] = MX 1 (t) ·... · MXn (t).
This is the product of the individual MGFs of X 1 ,... , Xn. In contrast, the PDF or CDF of the sum X 1 +... + Xn may be quite complicated to express and derive. Thus, even though the MGF of a distribution is less intuitive to interpret than the PDF or CDF, it provides a more convenient tool for understanding sums of independent random variables. We will see a couple examples of this below.
The Multivariate Normal distribution in dimension k is a joint distribution for k contin- uous random variables (X 1 ,... , Xk) ∈ Rk, which generalizes the normal distribution for a
This is the MGF for the normal distribution N (a 1 μ 1 +... + akμk, a^21 σ 12 +... + a^2 kσ k^2 ). Since the MGF uniquely specifies the distribution, this implies that a 1 X 1 +... + akXk is normally distributed. This is true for all a 1 ,... , ak ∈ R, so (X 1 ,... , Xk) are multivariate normal. The entries of the mean vector μ must be E[Xi] = μi. The entries of the covariance matrix Σ must be given by Σii = Var[Xi] = σ^2 i and Σij = Cov[Xi, Xj ] = 0 for all i 6 = j.
Example 3.7. Suppose Z 1 ,... , Zj are independent normal random variables, and each ran- dom variable X 1 ,... , Xk is a linear combination of Z 1 ,... , Zj :
X 1 = b 11 Z 1 +... + b 1 j Zj X 2 = b 21 Z 1 +... + b 2 j Zj .. . Xk = bk 1 Z 1 +... + bkj Zj
for some constants b 11 ,... , b 1 j ,... , bk 1 ,... , bkj ∈ R. Then (X 1 ,... , Xk) are multivariate normal. This is because any further linear combination a 1 X 1 +... + akXk for a 1 ,... , ak ∈ R may also be written as a linear combination of the original variables Z 1 ,... , Zj , and hence has a normal distribution by the preceding example.
When the covariance matrix Σ is invertible, an alternative way to define the distribution N (μ, Σ) is by the following formula for its joint PDF on Rk:
fX 1 ,...,Xk (x 1 ,... , xk) =
det(2πΣ)
· e−^
(^12) (x 1 −μ 1 , ..., xk −μk )>Σ− (^1) (x 1 −μ 1 , ..., xk −μk ) .
To visualize this PDF, consider the two-dimensional setting k = 2, where we obtain the special case of a Bivariate Normal distribution for two random variables X and Y. In this case, the distribution may be specified by the means μX and μY of X and Y , the variances σ^2 X and σ^2 Y of X and Y , and the correlation ρ between X and Y. (Then Cov[X, Y ] = ρσX σY .) The contours of the joint PDF fX,Y (x, y) are shown below, for correlation ρ = 0 on the left and correlation ρ = 0.75 on the right, when σ^2 X = σ Y^2 = 1 and μX = μY = 0:
312 Introduction to Probability
x − 3 −^2
− 1 0
1 2
y 3
− 3
− 2
− 1
0
1
2
3
x − 3 −^2
− 1 0
1 2
y 3
− 3
− 2
− 1
0
1
2
0.00 3
x
y
− 2
− 1
0
1
2
− 2 − 1 0 1 2 x
y
− 2
− 1
0
1
2
− 2 − 1 0 1 2
FIGURE 7. Joint PDFs of two Bivariate Normal distributions. On the left, X and Y are marginally N (0, 1) and have zero correlation. On the right, X and Y are marginally N (0, 1) and have correlation 0.75.
On the left, for ρ = 0, these contours are circular. The joint PDF has a peak at 0 and decays radially away from 0. On the right, for ρ = 0.75, the contours are ellipses. As ρ increases to 1, the contours concentrate more and more around the line y = x. More generally, the joint PDF of N (μ, Σ) in k dimensions has a single peak at the mean vector μ ∈ Rk, and its contours are ellipsoids around μ whose shape depends on Σ. Recall from Lecture 2 that, in general, Cov[X, Y ] = 0 does not imply that random variables X and Y are independent. However, this implication is true when (X, Y ) are bivariate normal. More generally, we have the following:
Theorem 3.8. Suppose X = (X 1 ,... , Xk) is multivariate normal. Let X 1 and X 2 be disjoint subvectors of X such that each entry of X 1 is uncorrelated with each entry of X 2. Then the vectors X 1 and X 2 are independent.
For data X 1 ,... , Xn which we will model as random variables, a statistic T (X 1 ,... , Xn) is any real-valued function of the data. In other words, it is any number computed from the data. For example, the sample mean
n
(X 1 +... + Xn),
the sample variance
n − 1
(X 1 − X¯)^2 +... + (Xn − X¯)^2
and the range R = max(X 1 ,... , Xn) − min(X 1 ,... , Xn)
are all statistics. Any statistic is itself also a random variable. One use of probability theory in this course will be to understand the distribution of a statistic, called its sampling distribution, based on the distribution of the original data X 1 ,... , Xn. Here, let us discuss two examples. The second example will introduce the important chi-squared distribution, which we will see again in later lectures.
Example 3.9 (Sample mean of IID normals). Suppose X 1 ,... , XnIID ∼ N (μ, σ^2 ). The sample mean X¯ is actually a special case of the quantity a 1 X 1 +... + anXn from Example 3.6, where ai = (^) n^1 , μi = μ, and σ^2 i = σ^2 for all i = 1,... , n. Then from that example,
μ,
σ^2 n
Example 3.10 (Chi-squared distribution). Suppose X 1 ,... , Xn IID ∼ N (0, 1). Then the distribution of the statistic X 12 +... + X n^2
is called the chi-squared distribution with nnn degree of freedom, abbreviated as χ^2 n.