








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The Normal distribution has two parameters, the mean, µ, and the variance, σ2. µ and σ2 satisfy −∞ <µ< ∞, σ2 > 0. We write X ∼ Normal(µ, σ2), ...
Typology: Slides
1 / 14
This page cannot be seen from the preview
Don't miss anything!









The Normal distribution is the familiar bell-shaped distribution. It is probably the most important distribution in statistics, mainly because of its link with
Before studying the Central Limit Theorem, we look at the Normal distribution and some of its general properties.
5.1 The Normal Distribution
Probability density function, fX (x)
fX (x) =
2 πσ^2
{−(x−μ)^2 / 2 σ^2 }
Distribution function, FX (x)
There is no closed form for the distribution function of the Normal distribution.
R command: FX (x) = pnorm(x, mean=μ, sd=sqrt(σ^2 )).
Probability density function, fX (x)
Distribution function, FX (x)
Mean and Variance
Linear transformations
If X ∼ Normal(μ, σ^2 ), then for any constants a and b,
aμ + b, a^2 σ^2
σ
μ σ
X − μ σ
Sums of Normal random variables
then
μ 1 + μ 2 , σ 12 + σ 22
i = 1,... , n, then
a 1 X 1 +a 2 X 2 +.. .+anXn ∼ Normal
(a 1 μ 1 +.. .+anμn), (a^21 σ 12 +.. .+a^2 nσ^2 n)
For mathematicians: properties of the Normal distribution
−∞ fX^ (x)^ dx^ = 1.
The full proof that
−∞
fX (x) dx =
−∞
2 πσ^2
e{−(x−μ)
(^2) /(2σ (^2) )} dx = 1
relies on the following result:
FACT:
−∞
e−y
2 dy =
π.
This result is non-trivial to prove. See Calculus courses for details.
Using this result, the proof that
−∞ fX^ (x)^ dx^ = 1 follows by using the change of variable y =
(x − μ) √ 2 σ
in the integral.
−∞
xfX (x) dx =
−∞
x
2 πσ^2
e−(x−μ) (^2) / 2 σ 2 dx
Change variable of integration: let z = x−σ μ: then x = σz + μ and dxdz = σ.
Then E(X) =
−∞
(σz + μ) ·
2 πσ^2
· e−z (^2) / 2 · σ dz
−∞
σz √ 2 π
· e−z (^2) / 2 dz ︸ ︷︷ ︸ this is an odd function of z (i.e. g(−z) = −g(z)), so it integrates to 0 over range −∞ to ∞.
−∞
2 π
e−z (^2) / 2 dz ︸ ︷︷ ︸ p.d.f. of N (0, 1) integrates to 1.
Thus E(X) = 0 + μ × 1 = μ.
Var(X) = E
(X − μ)^2
−∞
(x − μ)^2
2 πσ^2
e−(x−μ) (^2) /(2σ (^2) ) dx
= σ^2
−∞
2 π
z^2 e−z
(^2) / 2 dz
putting z =
x − μ σ
= σ^2
2 π
−ze−z
−∞
−∞
2 π
e−z (^2) / 2 dz
(integration by parts)
= σ^2 {0 + 1}
= σ^2.
5.2 The Central Limit Theorem (CLT)
also known as... the Piece of Cake Theorem
The Central Limit Theorem (CLT) is one of the most fundamental results in statistics. In its simplest form, it states that if a large number of independent random variables are drawn from any distribution, then the distribution of their sum (or alternatively their sample average) always converges to the Normal distribution.
Distribution of the sample mean, X, using the CLT
Let X 1 ,... , Xn be independent, identically distributed with mean E(Xi) = μ
The sample mean, X, is defined as:
X =
X 1 + X 2 +... + Xn n
So X = Sn n
X 1 + X 2 +... + Xn n
μ,
σ^2 n
The following three statements of the Central Limit Theorem are equivalent:
X 1 + X 2 +... + Xn n
∼ approx Normal
μ, σ 2 n
as n → ∞.
Sn = X 1 + X 2 +... + Xn ∼ approx Normal
nμ, nσ^2
as n → ∞.
Sn − nμ √ nσ^2
X − μ √ σ^2 /n
∼ approx Normal (0, 1) as n → ∞.
The essential point to remember about the Central Limit Theorem is that large sums or sample means of independent random variables converge to a Normal
More general version of the CLT
A more general form of CLT states that, if X 1 ,... , Xn are independent, and E(Xi) = μi, Var(Xi) = σ i^2 (not necessarily all equal), then
Zn =
∑n √∑i=1(Xi^ −^ μi) n i=1 σ 2 i
→ Normal(0, 1) as n → ∞.
Other versions of the CLT relax the condition that X 1 ,... , Xn are independent.
The Central Limit Theorem in action : simulation studies
The following simulation study illustrates the Central Limit Theorem, making use of several of the techniques learnt in STATS 210. We will look particularly
Example 1: Triangular distribution: fX (x) = 2x for 0 < x < 1.
x
f (x)
0 1
Find E(X) and Var(X):
μ = E(X) =
0
xfX (x) dx
0
2 x^2 dx
2 x^3 3
0
x^2 fX (x) dx −
0
2 x^3 dx −
2 x^4 4
0
Then E(Sn) = E(X 1 +... + Xn) = nμ = 2 n 3
n 18
( 2 n 3 ,^
n 18
Normal approximation to the Binomial distribution, using the CLT
Let Y ∼ Binomial(n, p).
Y = X 1 + X 2 +... + Xn, where Xi =
Thus by the CLT,
np, np(1 − p)
Thus,
︸︷︷︸^ np mean of Bin(n,p)
, np ︸ (1︷︷ − p︸) var of Bin(n,p)
The Binomial distribution is therefore well approximated by the Normal distribution when n is large, for any fixed value of p.
The Normal distribution is also a good approximation to the Poisson(λ) distribution when λ is large:
30 40 50 60 70
60 80 100 120 140
Binomial(n = 100, p = 0.5) Poisson(λ = 100)
Why the Piece of Cake Theorem?...
5.3 Confidence intervals
Example: Remember the margin of error for an opinion poll?
An opinion pollster wishes to estimate the level of support for Labour in an upcoming election. She interviews n people about their voting preferences. Let p be the true, unknown level of support for the Labour party in New Zealand. Let X be the number of of the n people interviewed by the opinion pollster who
At the end of Chapter 2, we said that the maximum likelihood estimator for p is p̂ =
n
In a large sample (large n), we now know that
So
p̂ =
n
p,
pq n
So ̂ p − p √pq n
Now if Z ∼ Normal(0, 1), we find (using a computer) that the 95% central probability region of Z is from − 1 .96 to +1.96:
P(− 1. 96 < Z < 1 .96) = 0. 95.
Check in R: pnorm(1.96, mean=0, sd=1) - pnorm(-1.96, mean=0, sd=1)
Confidence intervals for the Poisson λ parameter
We saw in section 3.6 that if X 1 ,... , Xn are independent, identically distributed with Xi ∼ Poisson(λ), then the maximum likelihood estimator of λ is
̂ λ = X =^1 n
∑^ n
i=
Xi.
Thus, when n is large,
̂ λ = X ∼ approx Normal(μ, σ
2 n
by the Central Limit Theorem. In other words,
λ,
λ n
We use the same transformation as before to find approximate 95% confidence intervals for λ as n grows large:
Let Z =
λ − λ √ λ n
Thus:
P
λ − λ √ λ n
λ − 1. 96
λ n < λ < ̂λ + 1. 96
λ n
So our estimated 95% confidence interval for the unknown parameter λ is:
̂ λ − 1. 96
λ n
λ n
Why is this so good?
It’s clear that it’s important to measure precision, or reliability, of an estimate, otherwise the estimate is almost worthless. However, we have already seen various measures of precision: variance, standard error, coefficient of variation, and now confidence intervals. Why do we need so many?
Var
λ
. It is an estimate of the square
root of the true variance, Var(̂λ). Because of the square root, the standard error is a direct measure of deviation from the mean, rather than squared deviation from the mean. This means it is measured in more intuitive units. However, it is still unclear how we should comprehend the information that the standard error gives us.
̂ λ − 1. 96
λ
λ
or equivalently, λ̂ − 1 .96 se(̂λ) to ̂λ + 1.96 se(̂λ).
So the Central Limit Theorem has given us an incredibly simple and power- ful way of converting from a hard-to-understand measure of precision, se(̂ λ), to a measure that is easily understood and relevant to the problem at hand. Brilliant!