The Normal Distribution and the Central Limit Theorem, Slides of Statistics

The Normal distribution has two parameters, the mean, µ, and the variance, σ2. µ and σ2 satisfy −∞ <µ< ∞, σ2 > 0. We write X ∼ Normal(µ, σ2), ...

Typology: Slides

2022/2023

Uploaded on 02/28/2023

shally_866
shally_866 🇺🇸

4.5

(27)

265 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 5: The Normal Distribution
and the Central Limit Theorem
The Normal distribution is the familiar bell-shaped distribution. It is probably
the most important distribution in statistics, mainly because of its link with
the Central Limit Theorem, which states that any large sum of independent,
identically distributed random variables is approximately Normal:
X1+X2+. . . +Xnapprox Normal
if X1, . . . , Xnare i.i.d. and nis large.
Before studying the Central Limit Theorem, we look at the Normal distribution
and some of its general properties.
5.1 The Normal Distribution
The Normal distribution has two parameters, the mean, µ, and the variance, σ2.
µand σ2satisfy −∞ <µ<,σ2>0.
We write XNormal(µ, σ2), or XN(µ, σ2).
Probability density function, fX(x)
fX(x) = 1
2πσ2e{−(xµ)2/2σ2}for −∞ <x<.
Distribution function, FX(x)
There is no closed form for the distribution function of the Normal distribution.
If XNormal(µ, σ2), then FX(x) can can only be calculated by computer.
Rcommand: FX(x) = pnorm(x, mean=µ, sd=sqrt(σ2)).
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download The Normal Distribution and the Central Limit Theorem and more Slides Statistics in PDF only on Docsity!

Chapter 5: The Normal Distribution

and the Central Limit Theorem

The Normal distribution is the familiar bell-shaped distribution. It is probably the most important distribution in statistics, mainly because of its link with

the Central Limit Theorem, which states that any large sum of independent,

identically distributed random variables is approximately Normal:

X 1 + X 2 +... + Xn ∼ approx Normal

if X 1 ,... , Xn are i.i.d. and n is large.

Before studying the Central Limit Theorem, we look at the Normal distribution and some of its general properties.

5.1 The Normal Distribution

The Normal distribution has two parameters, the mean, μ, and the variance, σ^2.

μ and σ^2 satisfy −∞ < μ < ∞, σ^2 > 0.

We write X ∼ Normal(μ, σ^2 ), or X ∼ N(μ, σ^2 ).

Probability density function, fX (x)

fX (x) =

2 πσ^2

e

{−(x−μ)^2 / 2 σ^2 }

for −∞ < x < ∞.

Distribution function, FX (x)

There is no closed form for the distribution function of the Normal distribution.

If X ∼ Normal(μ, σ^2 ), then FX (x) can can only be calculated by computer.

R command: FX (x) = pnorm(x, mean=μ, sd=sqrt(σ^2 )).

Probability density function, fX (x)

Distribution function, FX (x)

Mean and Variance

For X ∼ Normal(μ, σ^2 ), E(X) = μ, Var(X) = σ^2.

Linear transformations

If X ∼ Normal(μ, σ^2 ), then for any constants a and b,

aX + b ∼ Normal

aμ + b, a^2 σ^2

In particular, put a =

σ

and b = −

μ σ

, then

X ∼ Normal(μ σ^2 ) ⇒

X − μ σ

∼ Normal(0, 1).

Z ∼ Normal(0, 1) is called the standard Normal random variable.

Sums of Normal random variables

If X and Y are independent, and X ∼ Normal(μ 1 , σ 12 ), Y ∼ Normal(μ 2 , σ^22 ),

then

X + Y ∼ Normal

μ 1 + μ 2 , σ 12 + σ 22

More generally, if X 1 , X 2 ,... , Xn are independent, and Xi ∼ Normal(μi, σ^2 i ) for

i = 1,... , n, then

a 1 X 1 +a 2 X 2 +.. .+anXn ∼ Normal

(a 1 μ 1 +.. .+anμn), (a^21 σ 12 +.. .+a^2 nσ^2 n)

For mathematicians: properties of the Normal distribution

  1. Proof that

−∞ fX^ (x)^ dx^ = 1.

The full proof that

−∞

fX (x) dx =

−∞

2 πσ^2

e{−(x−μ)

(^2) /(2σ (^2) )} dx = 1

relies on the following result:

FACT:

−∞

e−y

2 dy =

π.

This result is non-trivial to prove. See Calculus courses for details.

Using this result, the proof that

−∞ fX^ (x)^ dx^ = 1 follows by using the change of variable y =

(x − μ) √ 2 σ

in the integral.

  1. Proof that E(X) = μ.

E(X) =

−∞

xfX (x) dx =

−∞

x

2 πσ^2

e−(x−μ) (^2) / 2 σ 2 dx

Change variable of integration: let z = x−σ μ: then x = σz + μ and dxdz = σ.

Then E(X) =

−∞

(σz + μ) ·

2 πσ^2

· e−z (^2) / 2 · σ dz

−∞

σz √ 2 π

· e−z (^2) / 2 dz ︸ ︷︷ ︸ this is an odd function of z (i.e. g(−z) = −g(z)), so it integrates to 0 over range −∞ to ∞.

  • μ

−∞

2 π

e−z (^2) / 2 dz ︸ ︷︷ ︸ p.d.f. of N (0, 1) integrates to 1.

Thus E(X) = 0 + μ × 1 = μ.

  1. Proof thatVar(X) = σ^2.

Var(X) = E

(X − μ)^2

−∞

(x − μ)^2

2 πσ^2

e−(x−μ) (^2) /(2σ (^2) ) dx

= σ^2

−∞

2 π

z^2 e−z

(^2) / 2 dz

putting z =

x − μ σ

= σ^2

2 π

[

−ze−z

2 / 2 ]∞

−∞

−∞

2 π

e−z (^2) / 2 dz

(integration by parts)

= σ^2 {0 + 1}

= σ^2. 

5.2 The Central Limit Theorem (CLT)

also known as... the Piece of Cake Theorem

The Central Limit Theorem (CLT) is one of the most fundamental results in statistics. In its simplest form, it states that if a large number of independent random variables are drawn from any distribution, then the distribution of their sum (or alternatively their sample average) always converges to the Normal distribution.

Distribution of the sample mean, X, using the CLT

Let X 1 ,... , Xn be independent, identically distributed with mean E(Xi) = μ

and variance Var(Xi) = σ^2 for all i.

The sample mean, X, is defined as:

X =

X 1 + X 2 +... + Xn n

So X = Sn n

, where Sn = X 1 +... + Xn ∼ approx Normal(nμ, nσ^2 ) by the CLT.

Because X is a scalar multiple of a Normal r.v. as n grows large, X itself is

approximately Normal for large n:

X 1 + X 2 +... + Xn n

∼ approx Normal

μ,

σ^2 n

as n → ∞.

The following three statements of the Central Limit Theorem are equivalent:

X =

X 1 + X 2 +... + Xn n

∼ approx Normal

μ, σ 2 n

as n → ∞.

Sn = X 1 + X 2 +... + Xn ∼ approx Normal

nμ, nσ^2

as n → ∞.

Sn − nμ √ nσ^2

X − μ √ σ^2 /n

∼ approx Normal (0, 1) as n → ∞.

The essential point to remember about the Central Limit Theorem is that large sums or sample means of independent random variables converge to a Normal

distribution, whatever the distribution of the original r.v.s.

More general version of the CLT

A more general form of CLT states that, if X 1 ,... , Xn are independent, and E(Xi) = μi, Var(Xi) = σ i^2 (not necessarily all equal), then

Zn =

∑n √∑i=1(Xi^ −^ μi) n i=1 σ 2 i

→ Normal(0, 1) as n → ∞.

Other versions of the CLT relax the condition that X 1 ,... , Xn are independent.

The Central Limit Theorem in action : simulation studies

The following simulation study illustrates the Central Limit Theorem, making use of several of the techniques learnt in STATS 210. We will look particularly

at how fast the distribution of Sn converges to the Normal distribution.

Example 1: Triangular distribution: fX (x) = 2x for 0 < x < 1.

x

f (x)

0 1

Find E(X) and Var(X):

μ = E(X) =

0

xfX (x) dx

0

2 x^2 dx

[

2 x^3 3

] 1

0

σ^2 = Var(X) = E(X^2 ) − {E(X)}^2

0

x^2 fX (x) dx −

0

2 x^3 dx −

[

2 x^4 4

] 1

0

Let Sn = X 1 +... + Xn where X 1 ,... , Xn are independent.

Then E(Sn) = E(X 1 +... + Xn) = nμ = 2 n 3

Var(Sn) = Var(X 1 +... + Xn) = nσ^2 by independence

⇒ Var(Sn) =

n 18

So Sn ∼ approx Normal

( 2 n 3 ,^

n 18

for large n, by the Central Limit Theorem.

Normal approximation to the Binomial distribution, using the CLT

Let Y ∼ Binomial(n, p).

We can think of Y as the sum of n Bernoulli random variables:

Y = X 1 + X 2 +... + Xn, where Xi =

1 if trial i is a “success” (prob = p),

0 otherwise (prob = 1 − p)

So Y = X 1 +... + Xn and each Xi has μ = E(Xi) = p, σ^2 = Var(Xi) = p(1 − p).

Thus by the CLT,

Y = X 1 + X 2 +... + Xn → Normal(nμ, nσ^2 )

= Normal

np, np(1 − p)

Thus,

Bin(n, p) → Normal

︸︷︷︸^ np mean of Bin(n,p)

, np ︸ (1︷︷ − p︸) var of Bin(n,p)

as n → ∞ with p fixed.

The Binomial distribution is therefore well approximated by the Normal distribution when n is large, for any fixed value of p.

The Normal distribution is also a good approximation to the Poisson(λ) distribution when λ is large:

Poisson(λ) → Normal(λ, λ)when λ is large.

30 40 50 60 70

60 80 100 120 140

Binomial(n = 100, p = 0.5) Poisson(λ = 100)

Why the Piece of Cake Theorem?...

  • The Central Limit Theorem makes whole realms of statistics into a piece

of cake.

  • After seeing a theorem this good, you deserve a piece of cake!

5.3 Confidence intervals

Example: Remember the margin of error for an opinion poll?

An opinion pollster wishes to estimate the level of support for Labour in an upcoming election. She interviews n people about their voting preferences. Let p be the true, unknown level of support for the Labour party in New Zealand. Let X be the number of of the n people interviewed by the opinion pollster who

plan to vote Labour. Then X ∼ Binomial(n, p).

At the end of Chapter 2, we said that the maximum likelihood estimator for p is p̂ =

X

n

In a large sample (large n), we now know that

X ∼ approx Normal(np, npq) where q = 1 − p.

So

p̂ =

X

n

∼ approx Normal

p,

pq n

(linear transformation of Normal r.v.)

So ̂ p − p √pq n

∼ approx Normal(0, 1).

Now if Z ∼ Normal(0, 1), we find (using a computer) that the 95% central probability region of Z is from − 1 .96 to +1.96:

P(− 1. 96 < Z < 1 .96) = 0. 95.

Check in R: pnorm(1.96, mean=0, sd=1) - pnorm(-1.96, mean=0, sd=1)

Confidence intervals for the Poisson λ parameter

We saw in section 3.6 that if X 1 ,... , Xn are independent, identically distributed with Xi ∼ Poisson(λ), then the maximum likelihood estimator of λ is

̂ λ = X =^1 n

∑^ n

i=

Xi.

Now E(Xi) = μ = λ, and Var(Xi) = σ^2 = λ, for i = 1,... , n.

Thus, when n is large,

̂ λ = X ∼ approx Normal(μ, σ

2 n

by the Central Limit Theorem. In other words,

̂ λ ∼ approx Normal

λ,

λ n

as n → ∞.

We use the same transformation as before to find approximate 95% confidence intervals for λ as n grows large:

Let Z =

λ − λ √ λ n

. We have Z ∼ approxNormal(0, 1) for large n.

Thus:

P

−^1.^96 <

λ − λ √ λ n

 '^0.^95.

Rearranging to put the unknown λ in the middle:

P

λ − 1. 96

λ n < λ < ̂λ + 1. 96

λ n

So our estimated 95% confidence interval for the unknown parameter λ is:

̂ λ − 1. 96

λ n

to ̂λ + 1. 96

λ n

Why is this so good?

It’s clear that it’s important to measure precision, or reliability, of an estimate, otherwise the estimate is almost worthless. However, we have already seen various measures of precision: variance, standard error, coefficient of variation, and now confidence intervals. Why do we need so many?

  • The true variance of an estimator, e.g. Var(̂λ), is the most convenient quantity to work with mathematically. However, it is on a non-intuitive scale (squared deviation from the mean), and it usually depends upon the unknown parameter, e.g. λ.
  • The standard error is se(̂ λ) =

Var

λ

. It is an estimate of the square

root of the true variance, Var(̂λ). Because of the square root, the standard error is a direct measure of deviation from the mean, rather than squared deviation from the mean. This means it is measured in more intuitive units. However, it is still unclear how we should comprehend the information that the standard error gives us.

  • The beauty of the Central Limit Theorem is that it gives us an incredibly easy way of understanding what the standard error is telling us, using Normal- based asymptotic confidence intervals as computed in the previous two examples. Although it is beyond the scope of this course to see why, the Central Limit Theorem guarantees that almost any maximum likelihood estimator will be Normally distributed as long as the sample size n is large enough, subject only to fairly mild conditions.

Thus, if we can find an estimate of the variance, e.g. ̂Var(̂λ), we can immediately

convert it to an estimated 95% confidence interval using the Normal formulation:

̂ λ − 1. 96

Var

λ

to ̂λ + 1. 96

Var

λ

or equivalently, λ̂ − 1 .96 se(̂λ) to ̂λ + 1.96 se(̂λ).

The confidence interval has an easily-understood interpretation: on 95% of

occasions we conduct a random experiment and build a confidence interval, the

interval will contain the true parameter.

So the Central Limit Theorem has given us an incredibly simple and power- ful way of converting from a hard-to-understand measure of precision, se(̂ λ), to a measure that is easily understood and relevant to the problem at hand. Brilliant!