Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Solutions to selected probability problems from a statistics textbook. Topics covered include moment generating functions for gamma and normal distributions, the chi-squared distribution, stirling's formula, and normal approximations. The solutions involve mathematical computations and the use of the gamma and normal distributions.
Typology: Exams
1 / 4
Fall 2009 Instructor: W. D. Gillam
(1) Calculate the moment generating function gX (t) = E(etX^ ) when X has a gamma distribution with parameters α, β.
Solution. Make the change of variables y = (1 − βt)β−^1 x to compute:
gX (t) =
0
etxxα−^1 e−x/β βαΓ(α)
dx
= (1 − βt)−α
0
yα−^1 e−y Γ(α)
dy
= (1 − βt)−α.
(2) Use the above result to prove that when X 1 ,... , Xn are IIDRV with standard normal distribution, then X^21 + · · · + X n^2 has a chi squared distribution with n degrees of freedom. Do this by showing that the moment generating functions coincide. Recall that we proved this in class by another method (more-or-less explicit calculation of density functions...). (3) Calculate the moment generating function gX (t) when X has a normal distribution with expected value μ and variance σ^2. Use this to prove that a sum of independent random variables with (possibly different!) normal distributions again has a normal distribution. As in the previous problem, do this by equating moment generating functions instead of explicitly calculating the convolution of densities as we did in class.
Solution. Any normally distributed random variable is obtained from a random variable with standard normal distribution by a linear change of variables; keeping track of the effect of this change of variables on MGFs and using the computation for the standard normal that we did in class, we find
gX (t) = exp
μt +
σ^2 t^2
(4) A random variable X is said to have a t-distribution with n degrees of freedom if X has the same distribution as X 0 √ 1 n (X
2 1 +^ · · ·^ +^ X n^2 )
n
X 12 + · · · + X n^2
where X 0 ,... , Xn are IIDRV with standard normal distribution. This just differs by a factor of
n from Student’s distribution, but it is sometimes more natural. Calculate the variance of X and observe that it is > 1 (assuming that n > 2 so that it exists). Plot the standard normal density and the t-densities with n = 1, 2 , 3 on the same axes. Observe that the t-density is similar to the standard normal bell curve, but slightly more spread out, accounting for its slightly larger variance. 1
Figure 1. Graphs of the standard normal density (blue), and the t- densities with n = 1, 2 , 3 (red, yellow, green, respectively).
Solution. Since the variance of Student’s distribution is 1/(n − 2) (when n > 2), the variance of the t-distribution will be n/(n − 2) > 1 (for n > 2). The density fn(x) for the t-distribution with n degrees of freedom is
fn(x) =
πn
Γ( n+1 2 ) Γ( n 2 )
(1 + x^2 /n)
n+1 2.
Plots are shown in Figure 1. These were constructed using the command graph = Plot[{(1/Sqrt[2Pi])Exp[-x^2/2], (1/Sqrt[Pi1])(Gamma[1]/Gamma[1/2])(1 + x^2)^(-1), (1/Sqrt[Pi2])(Gamma[3/2]/Gamma[2/2])(1 + x^2/2)^(-3/2), (1/Sqrt[Pi3])(Gamma[2]/Gamma[3/2])*(1 + x^2/3)^(-2)}, {x, -3, 3}] in Mathematica. (5) When we sketched the proof of Stirling’s Formula in class, recall that we approxi- mated n! by writing
n! =
( (^) n
e
)n √ 2 πn + ǫ 1 + ǫ 2 ,
where the “errors” ǫi were supposed to be small (compared to the leading term) in the sense that
lim n→∞
( (^) n
e
)−n (^) ǫ √i 2 πn
for i = 1, 2. The error ǫ 1 arose when we approximated the integral
n! =
0
xne−xdx
by using the series expansion
ln(xne−x) = n ln n − n −
(x − n)^2 2 n
to approximate the integrand; I will not ask you to prove this error is small, but try to prove it if you want. The second error
ǫ 2 =
( (^) n
e
)n ∫^0
−∞
exp
−(x − n)^2 2 n
dx
arose when we wanted to change limits of integration. Calculate this integral explicitly and verify that
nlim→∞
n
−∞
exp
−(x − n)^2 2 n
dx = 0.
(The following problems are taken from a statistics textbook and are intended to give you some “real world” examples. Use a computer to evaluate the normal probabilities. You can easily find web-based software that will do integrals of the normal density functions.)
(6) A forester studying the effects of fertilization on certain pine forests in the South- east is interested in estimating the average basal area of pine trees. In studying the basal areas of similar pine trees for many years, he has discovered these mea- surements (in square inches) to be normally distributed with standard deviation^1 approximately 4 square inches. If the forester samples n = 9 trees at random, find the probability that their average basal area will be within 2 square inches of the population mean. (7) Suppose the forester wants to be at least 90 percent certain that the sample mean will be within 2 square inches of the population mean. How many pine trees must he measure? (8) Suppose this forester is transfered to Maine and is interested in estimating the basal area of pine trees there. He assumes the basal areas of pine trees in Maine will have a normal distribution, but he isn’t sure about the average basal area of pine trees in Maine, or its standard deviation. He measure the basal areas of some randomly selected pine trees and obtains: 162 in^2 , 171 in^2 , 155 in^2 , 189 in^2 , 148 in^2 , 195 in^2 , 165 in^2 , 189 in^2 , 150 in^2. What is his best guess for the expected value and standard deviation of the basal area of Maine pine trees? (9) The EPA is concerned with the problem of setting criteria for the amounts of certain toxic chemicals to be allowed in freshwater lakes and rivers. A common measure of toxicity for a pollutant is the LC50 : the concentration of the pollutant that will kill half of the test species in a given amount of time (usually 96 hours for fish species).^2 In many studies, the values contained in the (natural) log of LC50 measurements are normally distributed and hence the analysis is based on ln LC50 data.
(^1) Standard deviation is the square root of the variance. Note that the units make sense. Variance would
be measured in inches^4 since it is the sum of the expected value of the square of a quantity measured in square inches and the square of the expected value of a quantity whose expected value is measured in square inches. (^2) You are probably familiar with the LD50 measure of poisons.
Studies of the effects of copper on trout show the variance of log LC50 measure- ments to be around .4 with concentrations measured in mg / L. If n = 10 studies on LC50 for copper are to be carried out, find the probability that the sample mean of log LC50 will differ from the true population mean by no more than .5. If the EPA wants to be 95 percent sure that the sample mean of log LC50 will differ from the population mean by no more than .5, how many tests should they carry out? (10) Suppose X 1 ,... , Xn, Y 1 ,... , Ym are independent random variables, and the Xi are normally distributed with expected value μ 1 , variance σ^21 , while the Yi are normally distributed with expected value μ 2 and variance σ 22. Calculate the distribution of Z = X 1 + · · · + Xn + Y 1 + · · · + Ym. (11) Suppose the effects of copper on bass show the variance of the log LC50 measure- ments to be .8. If the population means of the log LC50 are the same for bass and trout, find the probability that, with random samples of ten log LC50 mea- surements for each species, the sample mean for trout exceeds the sample mean for bass by at least 1.