Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

The Normal Distribution and the Central Limit Theorem, Slides of Statistics

University of California - Los Angeles (UCLA)Statistics

The Normal distribution has two parameters, the mean, µ, and the variance, σ2. µ and σ2 satisfy −∞ <µ< ∞, σ2 > 0. We write X ∼ Normal(µ, σ2), ...

Typology: Slides

2022/2023

Uploaded on 02/28/2023

shally_866 🇺🇸

4.5

(27)

265 documents

1 / 14

This page cannot be seen from the preview

Don't miss anything!

Chapter 5: The Normal Distribution

and the Central Limit Theorem

The Normal distribution is the familiar bell-shaped distribution. It is probably

the most important distribution in statistics, mainly because of its link with

the Central Limit Theorem, which states that any large sum of independent,

identically distributed random variables is approximately Normal:

X1+X2+. . . +Xn∼approx Normal

if X1, . . . , Xnare i.i.d. and nis large.

Before studying the Central Limit Theorem, we look at the Normal distribution

and some of its general properties.

5.1 The Normal Distribution

The Normal distribution has two parameters, the mean, µ, and the variance, σ2.

µand σ2satisfy −∞ <µ<∞,σ2>0.

We write X∼Normal(µ, σ2), or X∼N(µ, σ2).

Probability density function, fX(x)

fX(x) = 1

√2πσ2e{−(x−µ)2/2σ2}for −∞ <x<∞.

Distribution function, FX(x)

There is no closed form for the distribution function of the Normal distribution.

If X∼Normal(µ, σ2), then FX(x) can can only be calculated by computer.

Rcommand: FX(x) = pnorm(x, mean=µ, sd=sqrt(σ2)).

Discover Slides of Statistics University of California - Los Angeles (UCLA)

Partial preview of the text

Download The Normal Distribution and the Central Limit Theorem and more Slides Statistics in PDF only on Docsity!

Chapter 5: The Normal Distribution

and the Central Limit Theorem

The Normal distribution is the familiar bell-shaped distribution. It is probably the most important distribution in statistics, mainly because of its link with

the Central Limit Theorem, which states that any large sum of independent,

identically distributed random variables is approximately Normal:

X 1 + X 2 +... + Xn ∼ approx Normal

if X 1 ,... , Xn are i.i.d. and n is large.

Before studying the Central Limit Theorem, we look at the Normal distribution and some of its general properties.

5.1 The Normal Distribution

The Normal distribution has two parameters, the mean, μ, and the variance, σ^2.

μ and σ^2 satisfy −∞ < μ < ∞, σ^2 > 0.

We write X ∼ Normal(μ, σ^2 ), or X ∼ N(μ, σ^2 ).

Probability density function, fX (x)

fX (x) =

2 πσ^2

e

{−(x−μ)^2 / 2 σ^2 }

for −∞ < x < ∞.

Distribution function, FX (x)

There is no closed form for the distribution function of the Normal distribution.

If X ∼ Normal(μ, σ^2 ), then FX (x) can can only be calculated by computer.

R command: FX (x) = pnorm(x, mean=μ, sd=sqrt(σ^2 )).

Probability density function, fX (x)

Distribution function, FX (x)

Mean and Variance

For X ∼ Normal(μ, σ^2 ), E(X) = μ, Var(X) = σ^2.

Linear transformations

If X ∼ Normal(μ, σ^2 ), then for any constants a and b,

aX + b ∼ Normal

aμ + b, a^2 σ^2

In particular, put a =

σ

and b = −

μ σ

, then

X ∼ Normal(μ σ^2 ) ⇒

X − μ σ

∼ Normal(0, 1).

Z ∼ Normal(0, 1) is called the standard Normal random variable.

Sums of Normal random variables

If X and Y are independent, and X ∼ Normal(μ 1 , σ 12 ), Y ∼ Normal(μ 2 , σ^22 ),

then

X + Y ∼ Normal

μ 1 + μ 2 , σ 12 + σ 22

More generally, if X 1 , X 2 ,... , Xn are independent, and Xi ∼ Normal(μi, σ^2 i ) for

i = 1,... , n, then

a 1 X 1 +a 2 X 2 +.. .+anXn ∼ Normal

(a 1 μ 1 +.. .+anμn), (a^21 σ 12 +.. .+a^2 nσ^2 n)

For mathematicians: properties of the Normal distribution

Proof that

−∞ fX^ (x)^ dx^ = 1.

The full proof that

−∞

fX (x) dx =

−∞

2 πσ^2

e{−(x−μ)

(^2) /(2σ (^2) )} dx = 1

relies on the following result:

FACT:

−∞

e−y

2 dy =

π.

This result is non-trivial to prove. See Calculus courses for details.

Using this result, the proof that

−∞ fX^ (x)^ dx^ = 1 follows by using the change of variable y =

(x − μ) √ 2 σ

in the integral.

Proof that E(X) = μ.

E(X) =

−∞

xfX (x) dx =

−∞

x

2 πσ^2

e−(x−μ) (^2) / 2 σ 2 dx

Change variable of integration: let z = x−σ μ: then x = σz + μ and dxdz = σ.

Then E(X) =

−∞

(σz + μ) ·

2 πσ^2

· e−z (^2) / 2 · σ dz

−∞

σz √ 2 π

· e−z (^2) / 2 dz ︸︷︷︸ this is an odd function of z (i.e. g(−z) = −g(z)), so it integrates to 0 over range −∞ to ∞.

μ

−∞

2 π

e−z (^2) / 2 dz ︸︷︷︸ p.d.f. of N (0, 1) integrates to 1.

Thus E(X) = 0 + μ × 1 = μ.

Proof thatVar(X) = σ^2.

Var(X) = E

(X − μ)^2

−∞

(x − μ)^2

2 πσ^2

e−(x−μ) (^2) /(2σ (^2) ) dx

= σ^2

−∞

2 π

z^2 e−z

(^2) / 2 dz

putting z =

x − μ σ

= σ^2

2 π

[

−ze−z

2 / 2 ]∞

−∞

2 π

e−z (^2) / 2 dz

(integration by parts)

= σ^2 {0 + 1}

= σ^2.

5.2 The Central Limit Theorem (CLT)

also known as... the Piece of Cake Theorem

The Central Limit Theorem (CLT) is one of the most fundamental results in statistics. In its simplest form, it states that if a large number of independent random variables are drawn from any distribution, then the distribution of their sum (or alternatively their sample average) always converges to the Normal distribution.

Distribution of the sample mean, X, using the CLT

Let X 1 ,... , Xn be independent, identically distributed with mean E(Xi) = μ

and variance Var(Xi) = σ^2 for all i.

The sample mean, X, is defined as:

X =

X 1 + X 2 +... + Xn n

So X = Sn n

, where Sn = X 1 +... + Xn ∼ approx Normal(nμ, nσ^2 ) by the CLT.

Because X is a scalar multiple of a Normal r.v. as n grows large, X itself is

approximately Normal for large n:

X 1 + X 2 +... + Xn n

∼ approx Normal

μ,

σ^2 n

as n → ∞.

The following three statements of the Central Limit Theorem are equivalent:

X =

X 1 + X 2 +... + Xn n

∼ approx Normal

μ, σ 2 n

as n → ∞.

Sn = X 1 + X 2 +... + Xn ∼ approx Normal

nμ, nσ^2

as n → ∞.

Sn − nμ √ nσ^2

X − μ √ σ^2 /n

∼ approx Normal (0, 1) as n → ∞.

The essential point to remember about the Central Limit Theorem is that large sums or sample means of independent random variables converge to a Normal

distribution, whatever the distribution of the original r.v.s.

More general version of the CLT

A more general form of CLT states that, if X 1 ,... , Xn are independent, and E(Xi) = μi, Var(Xi) = σ i^2 (not necessarily all equal), then

Zn =

∑n √∑i=1(Xi^ −^ μi) n i=1 σ 2 i

→ Normal(0, 1) as n → ∞.

Other versions of the CLT relax the condition that X 1 ,... , Xn are independent.

The Central Limit Theorem in action : simulation studies

The following simulation study illustrates the Central Limit Theorem, making use of several of the techniques learnt in STATS 210. We will look particularly

at how fast the distribution of Sn converges to the Normal distribution.

Example 1: Triangular distribution: fX (x) = 2x for 0 < x < 1.

x

f (x)

0 1

Find E(X) and Var(X):

μ = E(X) =

0

xfX (x) dx

0

2 x^2 dx

[

2 x^3 3

] 1

0 σ^2 = Var(X) = E(X^2 ) − {E(X)}^2

0

x^2 fX (x) dx −

0

2 x^3 dx −

[

2 x^4 4

] 1

0

Let Sn = X 1 +... + Xn where X 1 ,... , Xn are independent.

Then E(Sn) = E(X 1 +... + Xn) = nμ = 2 n 3

Var(Sn) = Var(X 1 +... + Xn) = nσ^2 by independence

⇒ Var(Sn) =

n 18

So Sn ∼ approx Normal

( 2 n 3 ,^

n 18

for large n, by the Central Limit Theorem.

Normal approximation to the Binomial distribution, using the CLT

Let Y ∼ Binomial(n, p).

We can think of Y as the sum of n Bernoulli random variables:

Y = X 1 + X 2 +... + Xn, where Xi =

1 if trial i is a “success” (prob = p),

0 otherwise (prob = 1 − p)

So Y = X 1 +... + Xn and each Xi has μ = E(Xi) = p, σ^2 = Var(Xi) = p(1 − p).

Thus by the CLT,

Y = X 1 + X 2 +... + Xn → Normal(nμ, nσ^2 )

= Normal

np, np(1 − p)

Thus,

Bin(n, p) → Normal

︸︷︷︸^ np mean of Bin(n,p)

, np ︸ (1︷︷ − p︸) var of Bin(n,p)

as n → ∞ with p fixed.

The Binomial distribution is therefore well approximated by the Normal distribution when n is large, for any fixed value of p.

The Normal distribution is also a good approximation to the Poisson(λ) distribution when λ is large:

Poisson(λ) → Normal(λ, λ)when λ is large.

30 40 50 60 70

60 80 100 120 140

Binomial(n = 100, p = 0.5) Poisson(λ = 100)

Why the Piece of Cake Theorem?...

The Central Limit Theorem makes whole realms of statistics into a piece

of cake.

After seeing a theorem this good, you deserve a piece of cake!

5.3 Confidence intervals

Example: Remember the margin of error for an opinion poll?

An opinion pollster wishes to estimate the level of support for Labour in an upcoming election. She interviews n people about their voting preferences. Let p be the true, unknown level of support for the Labour party in New Zealand. Let X be the number of of the n people interviewed by the opinion pollster who

plan to vote Labour. Then X ∼ Binomial(n, p).

At the end of Chapter 2, we said that the maximum likelihood estimator for p is p̂ =

X

n

In a large sample (large n), we now know that

X ∼ approx Normal(np, npq) where q = 1 − p.

So

p̂ =

X

n

∼ approx Normal

p,

pq n

(linear transformation of Normal r.v.)

So ̂ p − p √pq n

∼ approx Normal(0, 1).

Now if Z ∼ Normal(0, 1), we find (using a computer) that the 95% central probability region of Z is from − 1 .96 to +1.96:

P(− 1. 96 < Z < 1 .96) = 0. 95.

Check in R: pnorm(1.96, mean=0, sd=1) - pnorm(-1.96, mean=0, sd=1)

Confidence intervals for the Poisson λ parameter

We saw in section 3.6 that if X 1 ,... , Xn are independent, identically distributed with Xi ∼ Poisson(λ), then the maximum likelihood estimator of λ is

̂ λ = X =^1 n

∑^ n

i=

Xi.

Now E(Xi) = μ = λ, and Var(Xi) = σ^2 = λ, for i = 1,... , n.

Thus, when n is large,

̂ λ = X ∼ approx Normal(μ, σ

2 n

by the Central Limit Theorem. In other words,

̂ λ ∼ approx Normal

λ,

λ n

as n → ∞.

We use the same transformation as before to find approximate 95% confidence intervals for λ as n grows large:

Let Z =

λ − λ √ λ n

. We have Z ∼ approxNormal(0, 1) for large n.

Thus:

P

−^1.^96 <

λ − λ √ λ n

 '^0.^95.

Rearranging to put the unknown λ in the middle:

P

λ − 1. 96

λ n < λ < ̂λ + 1. 96

λ n

So our estimated 95% confidence interval for the unknown parameter λ is:

̂ λ − 1. 96

λ n

to ̂λ + 1. 96

λ n

Why is this so good?

It’s clear that it’s important to measure precision, or reliability, of an estimate, otherwise the estimate is almost worthless. However, we have already seen various measures of precision: variance, standard error, coefficient of variation, and now confidence intervals. Why do we need so many?

The true variance of an estimator, e.g. Var(̂λ), is the most convenient quantity to work with mathematically. However, it is on a non-intuitive scale (squared deviation from the mean), and it usually depends upon the unknown parameter, e.g. λ.
The standard error is se(̂ λ) =

Var

λ

. It is an estimate of the square

root of the true variance, Var(̂λ). Because of the square root, the standard error is a direct measure of deviation from the mean, rather than squared deviation from the mean. This means it is measured in more intuitive units. However, it is still unclear how we should comprehend the information that the standard error gives us.

The beauty of the Central Limit Theorem is that it gives us an incredibly easy way of understanding what the standard error is telling us, using Normal- based asymptotic confidence intervals as computed in the previous two examples. Although it is beyond the scope of this course to see why, the Central Limit Theorem guarantees that almost any maximum likelihood estimator will be Normally distributed as long as the sample size n is large enough, subject only to fairly mild conditions.

Thus, if we can find an estimate of the variance, e.g. ̂Var(̂λ), we can immediately

convert it to an estimated 95% confidence interval using the Normal formulation:

̂ λ − 1. 96

Var

λ

to ̂λ + 1. 96

Var

λ

or equivalently, λ̂ − 1 .96 se(̂λ) to ̂λ + 1.96 se(̂λ).

The confidence interval has an easily-understood interpretation: on 95% of

occasions we conduct a random experiment and build a confidence interval, the

interval will contain the true parameter.

So the Central Limit Theorem has given us an incredibly simple and power- ful way of converting from a hard-to-understand measure of precision, se(̂ λ), to a measure that is easily understood and relevant to the problem at hand. Brilliant!

The Normal Distribution and the Central Limit Theorem, Slides of Statistics

Related documents

Partial preview of the text

Download The Normal Distribution and the Central Limit Theorem and more Slides Statistics in PDF only on Docsity!

Chapter 5: The Normal Distribution

and the Central Limit Theorem

the Central Limit Theorem, which states that any large sum of independent,

identically distributed random variables is approximately Normal:

X 1 + X 2 +... + Xn ∼ approx Normal

if X 1 ,... , Xn are i.i.d. and n is large.

The Normal distribution has two parameters, the mean, μ, and the variance, σ^2.

μ and σ^2 satisfy −∞ < μ < ∞, σ^2 > 0.

We write X ∼ Normal(μ, σ^2 ), or X ∼ N(μ, σ^2 ).

e

for −∞ < x < ∞.

If X ∼ Normal(μ, σ^2 ), then FX (x) can can only be calculated by computer.

For X ∼ Normal(μ, σ^2 ), E(X) = μ, Var(X) = σ^2.

aX + b ∼ Normal

In particular, put a =

and b = −

, then

X ∼ Normal(μ σ^2 ) ⇒

∼ Normal(0, 1).

Z ∼ Normal(0, 1) is called the standard Normal random variable.

If X and Y are independent, and X ∼ Normal(μ 1 , σ 12 ), Y ∼ Normal(μ 2 , σ^22 ),

X + Y ∼ Normal

More generally, if X 1 , X 2 ,... , Xn are independent, and Xi ∼ Normal(μi, σ^2 i ) for

E(X) =

[

2 / 2 ]∞

and variance Var(Xi) = σ^2 for all i.

, where Sn = X 1 +... + Xn ∼ approx Normal(nμ, nσ^2 ) by the CLT.

Because X is a scalar multiple of a Normal r.v. as n grows large, X itself is

approximately Normal for large n:

∼ approx Normal

as n → ∞.

X =

distribution, whatever the distribution of the original r.v.s.

at how fast the distribution of Sn converges to the Normal distribution.

[

] 1

0

σ^2 = Var(X) = E(X^2 ) − {E(X)}^2

[

] 1

Let Sn = X 1 +... + Xn where X 1 ,... , Xn are independent.

Var(Sn) = Var(X 1 +... + Xn) = nσ^2 by independence

⇒ Var(Sn) =

So Sn ∼ approx Normal

for large n, by the Central Limit Theorem.

We can think of Y as the sum of n Bernoulli random variables:

1 if trial i is a “success” (prob = p),

0 otherwise (prob = 1 − p)

So Y = X 1 +... + Xn and each Xi has μ = E(Xi) = p, σ^2 = Var(Xi) = p(1 − p).

Y = X 1 + X 2 +... + Xn → Normal(nμ, nσ^2 )

= Normal

Bin(n, p) → Normal

as n → ∞ with p fixed.

Poisson(λ) → Normal(λ, λ)when λ is large.

of cake.

plan to vote Labour. Then X ∼ Binomial(n, p).

X

X ∼ approx Normal(np, npq) where q = 1 − p.

X

∼ approx Normal

(linear transformation of Normal r.v.)

∼ approx Normal(0, 1).

Now E(Xi) = μ = λ, and Var(Xi) = σ^2 = λ, for i = 1,... , n.

̂ λ ∼ approx Normal

as n → ∞.

. We have Z ∼ approxNormal(0, 1) for large n.

−^1.^96 <

 '^0.^95.

Rearranging to put the unknown λ in the middle:

P

to ̂λ + 1. 96

Thus, if we can find an estimate of the variance, e.g. ̂Var(̂λ), we can immediately

convert it to an estimated 95% confidence interval using the Normal formulation:

Var

to ̂λ + 1. 96