






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Exponential, normal, Bernoulli, geometric distribution are discussed
Typology: Lecture notes
1 / 11
This page cannot be seen from the preview
Don't miss anything!







Course: Mathematical Statistics Term: Fall 2017 Instructor: Gordan Žitkovi´c
We use many different functions to describe probability distribution (pdfs, pmfs, cdfs, quantile functions, survival functions, hazard functions, etc.) Moment-generating functions are just another way of describing distribu- tions, but they do require getting used as they lack the intuitive appeal of pdfs or pmfs.
Definition 6. 1. 1. The moment-generating function (mgf) of the (dis- tribution of the) random variable Y is the function mY of a real param- eter t defined by mY (t) = E [etY^ ], for all t ∈ R for which the expectation E [etY^ ] is well defined.
It is hard to give a direct intuition behind this definition, or to explain at why it is useful, at this point. It is related to the notions of Fourier transform and generating functions. It will be only through examples in this and later lectures that a deeper understanding will emerge.
The first order of business is to compute the mgf for some of the more im- portant (named) random variables. In the case of a continuous distribution, the main tool is the fundamental theorem which we use with the function g(y) = exp(ty) - we think of t as fixed, so that
mY (t) = E [exp(tY)] = E [g(Y)] =
∫ (^) ∞
−∞
g(y) fY (y) dy =
∫ (^) ∞
−∞
ety^ fY (y) dy.
Example 6. 1. 2.
Then
mY (t) =
∫ (^) ∞
−∞
ety^ fY (y) dy =
∫ (^1)
0
ety^ dy = (^1) t (et^ − 1 ).
mY (t) =
∫ (^) ∞
0
ety^1 τ e−y/ τ^ dy = (^1) τ
∫ (^) ∞
0
e−y(^
1 τ −t)^ dy = (^) τ^1 11 τ −t^
= (^1) −^1 τ t.
mY (t) =
∫ (^) ∞
−∞
ety^ √^12 π e−^
1 2 y 2 dy.
This integral looks hard to evaluate, but there is a simple trick. We collect the exponential terms and complete the square:
etye−^
1 2 y 2 = e−^
1 2 (y−t) 2 e
1 2 t 2 .
If we plug this into the expression above and pull out e
1 2 t 2 which is constant, as far as the variable of integration is concerned, we get
mY (t) = e
1 2 t 2 ∫^ ∞ −∞
√^1 2 π e
− 12 (y−t)^2 dy.
This does not look like a big improvement at first, but it is. The expression inside the integral is the pdf of a normal distribution with mean t and variance 1. Therefore, it must integrate to 1, as does any pdf. It follows that
mY (t) = e
1 2 t
2 .
As you can see from the first part of this example, the moment generating function does not have to be defined for all t. Indeed, the mfg of the expo- nential function is defined only for t < (^) τ^1. We will not worry too much for about this, and simply treat mgfs as expressions in t, but this fact is good to keep in mind when one goes deeper into the theory.
The fundamental formula for continuous distributions becomes a sum in the discrete case. When Y is discrete with support SY and pmf pY, the mgf can be computed as follows, where, as above, g(y) = exp(ty):
y∈SY
exp(ty)pY (y).
Example 6. 1. 5.
mW (t) = etl^ (r−^1 l)t (e(r−l)t^ − 1 ) = e
tr (^) −eta t(r−l).
1 2 σ (^2) t 2 = e μ t+^
1 2 σ (^2) t 2 .
One of the most important properties of the moment-generating functions is that they turn sums of independent random variables into products:
Proposition 6. 2. 1. Let Y 1 , Y 2 ,... , Yn be independent random variables with mgfs mY 1 (t), mY 2 (t),... , mYn (t). Then the mgf of their sum Y = Y 1 + Y 2 + · · · + Yn is given by
mY (t) = mY 1 (t) × mY 2 (t) × · · · × mYn (t).
This proposition is true for all random variables, but here is a sketch of the argument in the continuous case. It is a consequence of the factoriza- tion theorem (Theorem ?? ) and the fundamental formula (Theorem ?? ). For simplicity, let us assume that n = 2:
mY 1 +Y 2 (t) = E [et(Y^1 +Y^2 )] = E [g(Y 1 , Y 2 )],
where g(y 1 , y 2 ) = et(y^1 +y^2 ). The factorization criterion says that fY 1 ,Y 2 (y 1 , y 2 ) =
fY 1 (y 1 ) fY 2 (y 2 ), and, so
mY 1 +Y 2 (y) =
∫ (^) ∞
−∞
∫ (^) ∞
−∞
et(y^1 +y^2 )^ fY 1 ,Y 2 (y 1 , y 2 ) dy 2 dy 1
∫ (^) ∞
−∞
∫ (^) ∞
−∞
ety^1 ety^2 fY 1 (y 1 ) fY 2 (y 2 ) dy 2 dy 1
∫ (^) ∞
−∞
ety^1 fY 1 (y 1 )
−∞
ety^2 fY 2 (y 2 ) dy 2
dy 1
∫ (^) ∞
−∞
ety^1 fY 1 (y 1 )mY 2 (t) dy 2 = mY 2 (t)
∫ (^) ∞
−∞
ety^2 fY 2 (y 2 ) dy 2
= mY 2 (t)mY 1 (t).
Example 6. 2. 2. Binomial distribution. Let Y ∼ b(n, p). We know that Y counts the number of successes in n independent Bernoulli trials, so we can represent (in distribution) as Y = Y 1 + · · · + Yn, where each Yi is a B(p)-random variable. We know from Example 6. 1. 3 that the mgf mYi (t) of each Yi is q + pet. Therefore
mY (t) = mY 1 (t) × mY 2 (t) × · · · × mYn (t) = (q + pet)n.
We could have obtained the same formula without the factorization criterion, but the calculation is trickier:
mY (t) =
n
y= 0
ety^ pY (y) =
n
y= 0
ety
n y
pyqn−y^ =
n
y= 0
n y
(pet)yqn−y
= (pet^ + q)n,
where the last inequality follows from the binomial formula
(a + b)n^ =
n
y= 0
n y
aybn−y.
The terminology “moment generating function” comes from the following nice fact:
Proposition 6. 3. 1. Suppose that the moment-generating function mY (t) of a random variable Y admits an expansion into a power series. Then the
mY (t) = e
tr (^) −etl t(r−l).
We expand this into a series, buy expanding the numerator, first:
etr^ − etl^ =
∞
k= 0
1 k! (tr)
k (^) −
∞
k= 0
1 k! (tl)
k (^) =
∞
k= 0
rk^ −lk k! t
k.
Then we divide by the denominator t(r − l) to get
mY (t) =
∞
k= 0
rk^ −lk k!(r−l) t
k− (^1) = 1 + r^2 −l^2 2!(r−l) t^ +^
r^3 −l^3 3!(r−l) t
It follows that μ k = r k+ (^1) −lk+ 1 (k+ 1 )(r−l).
et
∞
k= 0
1 k! (t
(^2) /2)k (^) =
∞
k= 0
1 2 k^ k! t
2 k.
The odd powers of t are all 0 so
μ k = 0 if k is odd.
For a moment of an even order 2k, we get
μ 2 k = ( 22 kk (^) k)!!.
In all examples above we managed to expand the mfg into a power series without using Taylor’s theorem, i.e., without derivatives. Sometimes, the eas- iest approach is to differentiate (the notation in ( 6. 3. 2 ) means “take k deriva- tives in t, and then set t = 0”):
Proposition 6. 3. 3. The k-th moment μ k of a random variable with the moment-generating function mY (t) is given by
μ k = d
k dtk^ mY^ (^0 )|t=^0 ,^ (^6.^3.^2 ) as long as mY is defined for t in some neighborhood of 0.
Example 6. 3. 4. Let Y be the Poisson random variable so that
mY (t) = e λ (e
t (^) − 1 ) .
The first derivative m′ Y (t) is given by e λ (e
t (^) − 1 ) λ et^ and, so
μ 1 = E [Y] = m′ Y (t)|t= 0 = e λ (e
(^0) − 1 ) λ e^0 = λ.
We can differentiate again to obtain
m′′ Y (t) = λ ( 1 + et λ )et+ λ (e
t (^) − 1 ) ,
which yields μ 2 = λ ( 1 + λ ). One can continue and compute higher moments
μ 3 = λ ( 1 + 3 λ + λ^2 ), μ 4 = λ ( 1 + 7 λ + 6 λ^2 + λ^3 ), etc.
There is no simple formula for the general term μ k.
It is clear that different distributions come with different pdfs (pmf) and cdfs. It is also true for mgfs, but it is far from obvious and the proof is way outside the scope of these notes:
Theorem 6. 4. 1 (Uniqueness theorem). If two random variables Y 1 and Y 2 have the same moment generating functions, i.e., if
mY 1 (t) = mY 2 (t) for all t,
then the have the same distribution. In particular,
1. if Y 1 is discrete, then so is Y 2 , and Y 1 and Y 2 have the same support and the pmf, i.e.,
SY 1 = SY 2 and pY 1 (y) = pY 2 (y) for all y.
2. If Y 1 is continuous, then so is Y 2 , and Y 1 and Y 2 have the same pdf, i.e.,
fY 1 (y) = fY 2 (y), for all y.
The way we use this result is straightforward:
Problem 6. 5. 3. The standard deviation of the random variable Y whose mgf is given by mY (t) = √ 11 − 2 t is:
(a) 2 (b)
2 (c) 43 (d)
√ 3 2 (e) none of the above
Problem 6. 5. 4. Compute the standard deviation of the random variable Y whose mgf is given by mY (t) = (^) ( 2 −^1 et (^) ) 3.
Problem 6. 5. 5. Let Y 1 ,... , Yn be independent random variables with distri- bution N( μ , σ ). Then the mgf of (^1) n (Y 1 + · · · + Yn) is
(a) e
μ n t+^
1 2 σ (^2) t 2
(b) e μ t+^
1 2 n σ (^2) t 2
(c) e μ t+^
1 2 n σ (^2) t 2
(d) en μ t+^
1 2 σ (^2) t 2
(e) none of the above
Problem 6. 5. 6. What is the distribution of the sum S = Y 1 + Y 2 + · · · + Yk, if Y 1 ,... , Yk are independent and, for i = 1,... , k,
Yi ∼ B(p), with p ∈ (0, 1),
Yi ∼ b(ni, p), with n 1 ,... , nk ∈ N, p ∈ (0, 1),
Yi ∼ P( λ i), with λ 1 ,... , λ k > 0,
Problem 6. 5. 7 (The mgf of the χ^2 distribution). We learned in class that the distribution of the random variable Y = Z^2 , where Z is a standard normal, is called the χ^2 -distribution. We also computed its pdf using the cdf method. The goal of this exercise is to compute its mgf.
−∞ √^1 2 π e− β x 2 dx, where β > 0 is a constant. (Hint: Use the fact
that
−∞
1 σ
√ 2 π e
− 12 (y− μ )^2 / σ^2 dy = 1, as the expression inside the integral is
the pdf of N( μ , σ ). Plug in μ = 0 and an appropriate value of σ .)
Problem 6. 5. 8. (*) Let Y 1 , Y 2 ,... , Yn be independent random variables each with the Bernoulli B(p) distribution, for some p ∈ (0, 1).
Wn = Y^1 +^ √Y^2 +···+Yn^ −np np( 1 −p)
can be written in the form
mWn (t) = (pe
√^ t n α^ + ( 1 − p)e−^ √t n α
− 1 )n, ( 6. 5. 1 )
for some α and find its value.
Write down the Taylor approximations in t around 0 for the functions exp( √tn α ) and exp(− √tn α −^1 ), up to and including the term involving t^2. Then, substitute those approximations in ( 6. 5. 1 ) above. What to you get? When n is large, √tn α and √tn α −^1 are close to 0 and it can be shown that the expression you got is the limit of mWn (t), as n → ∞.
What distribution is that limit the mgf of?
(Note: Convergence of mgfs corresponds to a very important mode of conver- gence, called the weak convergence. We will not talk about it in this class, but it is exactly the kind of convergence that appears in the central limit theorem, which is, in turn, behind the effectiveness of the normal approximation to binomial random variables. In fact, what you just did is a fundamental part of the proof of the central limit theorem.)