





















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This document from a statistics for scientists and engineers course explores the normal distribution, its utility, characteristics, computing probabilities using standard normal tables, and the central limit theorem. The normal distribution is a bell-shaped curve with wide applicability in various fields due to its symmetry, unimodality, and the fact that sums and averages of random samples tend to follow it.
Typology: Study notes
1 / 29
This page cannot be seen from the preview
Don't miss anything!






















Stat 3000 – Statistics for Scientists and EngineersDr. Corcoran, Summer 2009
The normal distribution (a.k.a., the Gaussian distribution or “bellcurve”) is the by far the best known random distribution. It’sdiscovery has had such a far-reaching impact in modelingquantitative phenomena across the physical, social, and biologicalsciences that it’s founder has even found his way on to a majorcurrency (before the Euro):
Stat 3000 – Statistics for Scientists and EngineersDr. Corcoran, Summer 2009
Utility of the Normal Distribution The normal distribution has such broad applicability in partbecause• phenomena in the natural world that result from the
interaction of many environmental and genetic factors tendto follow the normal distribution (e.g., height, weight,measurable intelligence).• the sums and averages of random samples have distributionsthat look roughly normal – as the sample size gets larger,the normal approximation gets better. This result is knownas the^
Central Limit Theorem
. As we will discuss later,
this applies even to samples of categorical variables!
Stat 3000 – Statistics for Scientists and EngineersDr. Corcoran, Summer 2009
Computing Probabilities using the Normal Distribution Recall that for a continuous random variable
X^ with probability density function
(pdf)^ f ( x
), one cannot compute
P ( X^ =^ x
). That is, the pdf does not yield
probabilities as does a discrete probability mass function. Technically, for acontinuous random variable
X ,^ P ( X =^ x ) = 0.
However, we can compute probabilities over
intervals
of^ X^ – that is, the
probability that
X^ lies between two numbers
a^ and^ b
is equal to the area under the
density curve between
a^ and^ b,
for example:^ a^
b
Stat 3000 – Statistics for Scientists and EngineersDr. Corcoran, Summer 2009
Computing Probabilities using the Normal Distribution To this point, we have computed areas under a density curve by usingintegration. However, since the normal density (
i ) cannot be
integrated in closed form and (
ii ) is used by researchers with access to
modern computing tools, probabilities based on the normaldistribution can be obtained using tables or computer software.A normal probability table looks something like what is shown on thelast pages of this handout (reproduced from a previous edition of thetext). Such a table is based on the
standard normal distribution
, or
the normal distribution with zero mean and variance of 1.Using this table, what is the probability that a randomly sampledN(0,1) variable is less than 1.34? Less than –0.28? Between –2.54and 1.68? For what
x^ does
x ) = 0.975?
Stat 3000 – Statistics for Scientists and EngineersDr. Corcoran, Summer 2009 Example IV.A
Data from a study of king crabs on Kodiak Island, AK, (carriedout by the Alaska Department of Fish and Game) show that malecrab length is normally distributed with a mean of 134.7 mm and astandard deviation of 25.5 mm.What proportion of the male crab population on Kodiak Island isless than 140 mm? What proportion is between 100 and 140 mm?What is the probability that a randomly selected male crab willmeasure at least 170 mm?What is the 75
th^ percentile of this population? The 99
th^ percentile?
Stat 3000 – Statistics for Scientists and EngineersDr. Corcoran, Summer 2009
Sums of Normally Distributed Random Variables Yet another interesting feature of the normal distribution is thatsums of normally distributed independent variables are alsonormally distributed.Suppose we have two independent random variables
X and^1
such that
(^2) μ , σ ), and 1 1
(^2) μ , σ ), and we define 2 2
Y^ such
that^ Y^
=^ cX^11
+^ cX 2
, where 2
c and^1
c are constants. Then^2 Y^ ~ N(
cμ +^11
cμ ,^ c^22
22 σ + 11
(^22) cσ 22
Stat 3000 – Statistics for Scientists and EngineersDr. Corcoran, Summer 2009 Example IV.B
Consider again the population of Kodiak crabs discussed in Example IV.A
. Suppose that we randomly sample 20 specimens from this population. What is the probability that the sample meanwill lie between 124.7 and 144.7 mm?Compute an interval centered at the mean
μ^ such that a sample
average of 20 male crabs will lie within that interval with 95%probability. What sample size is required to reduce the total widthof this interval to 20 mm?
Stat 3000 – Statistics for Scientists and EngineersDr. Corcoran, Summer 2009
The Central Limit Theorem
Suppose that we have a sample
X from some distribution n^
with mean
μ^ and variance
(^2) σ. If n^ is sufficiently large, then the
sample mean
(^2) μ , σ / n ). This is true even if the underlying
population is not normal
relatively larger
n. We refer to this result as the
Central Limit
Theorem
, or^ CLT
. It represents one of the most remarkable
results in mathematical statistics.The CLT applies even to samples from some categoricaldistributions, including the binomial and Poisson distributions.
Stat 3000 – Statistics for Scientists and EngineersDr. Corcoran, Summer 2009
Example IV.C, cont’d
Note further that the underlying distribution is not at all normal:it’s a binary distribution with just two points of probability mass atzero and one. The density of the normal distribution is continuous,unimodal, bell-shaped, symmetric, and has a domain over the entirereal line.However, the CLT claims that if
n^ is sufficiently large, the
distribution of
will be approximately normal. What will the mean and variance of this distribution be?
Stat 3000 – Statistics for Scientists and EngineersDr. Corcoran, Summer 2009
Example IV.C, cont’d
Take a coin, and flip it 30 times.
Record the results of the flips
in order! What is the proportion of your first 10 flips that were heads?What is the proportion of the first 20 flips (the first 10 combinedwith the second 10) that were heads?What is the proportion of all 30 that were heads?
Stat 3000 – Statistics for Scientists and EngineersDr. Corcoran, Summer 2009
Example IV.C, cont’d
Plot below the distribution of sample proportions for the wholeclass, with
n^ = 20:
0.950. 0.750. 0.550. 0.350. 0.150.
Stat 3000 – Statistics for Scientists and EngineersDr. Corcoran, Summer 2009
Example IV.C, cont’d
Plot below the distribution of sample proportions for the wholeclass, with
n^ = 30:
0.950. 0.750. 0.550. 0.350. 0.150.
Stat 3000 – Statistics for Scientists and EngineersDr. Corcoran, Summer 2009
Approximating the Binomial and Poisson Distributions In light of the CLT, it’s not surprising that the normal distribution can provide afairly accurate approximation – under certain (not necessarily uncommon)circumstances – of binomial and Poisson probabilities. Can you explain why?For example, the plots below superimpose normal curves on binomialdistributions with different values of
n^ and^ p
. For what sorts of binomial
distributions will the normal distribution prove more accurate?^ n^ = 10,
p^ = 0.^
n^ = 10,^ p^
= 0.^
n^ = 100,^ p
= 0.
Stat 3000 – Statistics for Scientists and EngineersDr. Corcoran, Summer 2009 Example IV.D
What are the mean and variance of the number of diabetic seniors in
Example
III.E? Use a normal approximation to compute
P ( X^ ≥^ 8).
When using a normal approximation to compute binomial probabilities, we canimprove our accuracy with a
continuity correction
. That is, if
X^ ~ Bin(
n ,^ p ),
and we wish to compute
then using the normal distribution we
would approximate this with
P ( a^ – ½ <
X^ <^ b^ + ½).
For the diabetes example, use the continuity correction to compute
P ( X^ ≥^ 8) as
well as^
P ( X^ = 2). How accurate are these approximations?
Example IV.E
What are the mean and variance of the number of Logan traffic accidents in Example III.H
? Use a normal approximation to compute the same probabilities computed in that example.
), (^
bX aP ≤≤