Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Lecture 3 — Moment generating functions, multivariate normal ..., Exams of Statistics

One reason for studying the MGF is that it provides a different, and sometimes more convenient, way of describing the distribution of a random variable X.

Typology: Exams

2022/2023

Uploaded on 02/28/2023

leonpan
leonpan 🇺🇸

4

(12)

286 documents

1 / 6

Toggle sidebar

Related documents


Partial preview of the text

Download Lecture 3 — Moment generating functions, multivariate normal ... and more Exams Statistics in PDF only on Docsity! S&DS 242/542: Theory of Statistics Spring 2022 Lecture 3 — Moment generating functions, multivariate normal 3.1 Moment generating functions A tool from probability that will be particularly useful for us is the moment generating function (MGF) of a random variable X. This is a function of t ∈ R defined by MX(t) = E[etX ]. Depending on the distribution of the random variable X, it is possible for MX(t) to be ∞ for some values of t. Here are two examples: Example 3.1 (Normal MGF). Suppose X ∼ N (0, 1). Then MX(t) = E[etX ] = ∫ ∞ −∞ etx 1√ 2π e− x2 2 dx = ∫ ∞ −∞ 1√ 2π e −x2+2tx 2 dx. To compute this integral, we complete the square:∫ ∞ −∞ 1√ 2π e −x2+2tx 2 dx = ∫ ∞ −∞ 1√ 2π e −x2+2tx−t2 2 + t2 2 dx = e t2 2 ∫ ∞ −∞ 1√ 2π e− (x−t)2 2 dx. The integrand in the last integral above is the PDF of the N (t, 1) distribution—hence this last integral equals 1. So the MGF of X is simply MX(t) = e t2 2 . Now suppose X ∼ N (µ, σ2). This means that X − µ ∼ N (0, σ2), and X−µ σ ∼ N (0, 1). Then we may represent X as X = µ+ σZ where Z ∼ N (0, 1). The MGF of X is MX(t) = E[etX ] = E[eµt+σtZ ] = eµtE[eσtZ ] = eµtMZ(σt) = eµt+ σ2t2 2 , where in the last step we have applied the MGF for Z ∼ N (0, 1) computed above. In this normal example, MX(t) <∞ for all t ∈ R. Example 3.2 (Gamma MGF). Suppose X ∼ Gamma(α, β) (where α, β > 0). Then MX(t) = E[etX ] = ∫ ∞ 0 etx βα Γ(α) xα−1e−βxdx = βα Γ(α) ∫ ∞ 0 xα−1e(t−β)xdx. We consider three cases: • If t > β, then the integrand xα−1e(t−β)x increases to infinity as x→∞. So the integral is ∞, and MX(t) =∞. 3-1 • If t = β, the integral is simply ∫∞ 0 xα−1dx = 1 α xα ∣∣∞ 0 . Since α > 0, we have limx→∞ x α = ∞, so also MX(t) =∞. • If t < β, let us rewrite the above as MX(t) = βα (β − t)α ∫ ∞ 0 (β − t)α Γ(α) xα−1e−(β−t)xdx. This integrand is now the PDF of the Gamma(α, β− t) distribution (where both α > 0 and β − t > 0 as required). So the integral equals 1, and MX(t) = βα (β−t)α . Combining these cases, MX(t) = { ∞ if t ≥ β βα (β−t)α if t < β = { ∞ if t ≥ β (1− t/β)−α if t < β One reason for studying the MGF is that it provides a different, and sometimes more convenient, way of describing the distribution of a random variable X. Last lecture, we explained how the distribution of X may be specified by its PMF/PDF, or by its CDF. If the MGF MX(t) is finite for all t in some open interval around 0, as in both of the above examples, then the MGF also uniquely specifies the distribution of X. This is the content of the following theorem (which we will not prove in this class): Theorem 3.3. Let X and Y be two random variables such that, for some h > 0 and every t ∈ (−h, h), both MX(t) and MY (t) are finite and MX(t) = MY (t). Then X and Y have the same distribution. For example, this means that if X is any random variable for which MX(t) = e t2 2 , then the distribution of X must be N (0, 1) by this theorem and our calculation in Example 3.1. The MGF is particularly useful in statistics because we often deal with independent data. If X1, . . . , Xn are independent random variables, then the MGF of their sum satisfies MX1+...+Xn(t) = E[et(X1+...+Xn)] = E[etX1 ] · . . . · E[etXn ] = MX1(t) · . . . ·MXn(t). This is the product of the individual MGFs of X1, . . . , Xn. In contrast, the PDF or CDF of the sum X1 + . . .+Xn may be quite complicated to express and derive. Thus, even though the MGF of a distribution is less intuitive to interpret than the PDF or CDF, it provides a more convenient tool for understanding sums of independent random variables. We will see a couple examples of this below. 3.2 The Multivariate Normal distribution The Multivariate Normal distribution in dimension k is a joint distribution for k contin- uous random variables (X1, . . . , Xk) ∈ Rk, which generalizes the normal distribution for a 3-2 On the left, for ρ = 0, these contours are circular. The joint PDF has a peak at 0 and decays radially away from 0. On the right, for ρ = 0.75, the contours are ellipses. As ρ increases to 1, the contours concentrate more and more around the line y = x. More generally, the joint PDF of N (µ,Σ) in k dimensions has a single peak at the mean vector µ ∈ Rk, and its contours are ellipsoids around µ whose shape depends on Σ. Recall from Lecture 2 that, in general, Cov[X, Y ] = 0 does not imply that random variables X and Y are independent. However, this implication is true when (X, Y ) are bivariate normal. More generally, we have the following: Theorem 3.8. Suppose X = (X1, . . . , Xk) is multivariate normal. Let X1 and X2 be disjoint subvectors of X such that each entry of X1 is uncorrelated with each entry of X2. Then the vectors X1 and X2 are independent. 3.3 Sampling distributions of statistics For data X1, . . . , Xn which we will model as random variables, a statistic T (X1, . . . , Xn) is any real-valued function of the data. In other words, it is any number computed from the data. For example, the sample mean X̄ = 1 n (X1 + . . .+Xn), the sample variance S2 = 1 n− 1 ( (X1 − X̄)2 + . . .+ (Xn − X̄)2 ) , and the range R = max(X1, . . . , Xn)−min(X1, . . . , Xn) are all statistics. Any statistic is itself also a random variable. One use of probability theory in this course will be to understand the distribution of a statistic, called its sampling distribution, based on the distribution of the original data X1, . . . , Xn. Here, let us discuss two examples. The second example will introduce the important chi-squared distribution, which we will see again in later lectures. Example 3.9 (Sample mean of IID normals). SupposeX1, . . . , Xn IID∼ N (µ, σ2). The sample mean X̄ is actually a special case of the quantity a1X1 + . . .+anXn from Example 3.6, where ai = 1 n , µi = µ, and σ2 i = σ2 for all i = 1, . . . , n. Then from that example, X̄ ∼ N ( µ, σ2 n ) . Example 3.10 (Chi-squared distribution). Suppose X1, . . . , Xn IID∼ N (0, 1). Then the distribution of the statistic X2 1 + . . .+X2 n is called the chi-squared distribution with n degree of freedom, abbreviated as χ2 n. 3-5 Since this is a sum of n independent random variables, this distribution is easiest to study using the MGF rather than the PDF. By independence of X2 1 , . . . , X 2 n, MX2 1+...+X2 n (t) = MX2 1 (t) · . . . ·MX2 n (t). We may compute, for each Xi, its MGF MX2 i (t) = E[etX 2 i ] = ∫ ∞ −∞ etx 2 1√ 2π e− x2 2 dx = ∫ ∞ −∞ 1√ 2π e(t− 1 2 )x2dx. If t ≥ 1 2 , then MX2 i (t) =∞. Otherwise, let us write MX2 i (t) = 1√ 1− 2t ∫ ∞ −∞ √ 1− 2t 2π e− 1 2 (1−2t)x2dx. We recognize this integrand as the PDF of the N (0, 1 1−2t ) distribution, and hence the integral equals 1. Then MX2 i (t) = { ∞ t ≥ 1 2 (1− 2t)−1/2 t < 1 2 . Comparing this result with Example 3.2, we observe that this is the MGF of the Gamma(1 2 , 1 2 ) distribution. So X2 i ∼ Gamma(1 2 , 1 2 ). That is to say, the distribution χ2 1 for the square of a single N (0, 1) random variable is another name for Gamma(1 2 , 1 2 ). Returning now to the sum X2 1 + . . .+X2 n, MX2 1+...+X2 n (t) = MX2 1 (t) · . . . ·MX2 n (t) = { ∞ t ≥ 1 2 (1− 2t)−n/2 t < 1 2 . Comparing again with Example 3.2, this is the MGF of the Gamma(n 2 , 1 2 ) distribution, so X2 1 + . . . + X2 n ∼ Gamma(n 2 , 1 2 ). That is to say, the distribution χ2 n is another name for Gamma(n 2 , 1 2 ). Its PDF is the Gamma PDF, for x > 0, f(x) = 1 2n/2Γ(n/2) xn/2−1e−x/2. 3-6