MATH 243, Lecture 3: Standard Deviation and Normal Distributions - Prof. Thomas Bell | Study notes Probability and Statistics

MATH 243, LECTURE 3

1. Standard deviation

We have seen that mean and median both measure the “middle” of a set of data. But the mean is easier

to compute, while the median is reliably in the middle.

The situation is similar when measuring the “spread” of data. Last time we defined the inter-quartile

ratio Q3−Q1, which gives us how much of a spread is needed to account for half of the data. The way to

measure spread in a computable way, akin to the average, is with the standard deviation.

Definition 1. Let

x1, .., xn

be a list of data. Let xbe the mean. The standard deviation is given by

σ=v

n−1

i=1

(xi−x)2

Why is this a reasonable measure of spread? If it is small then one expects the quartiles to be close

together, and if it is large then the quartiles should be spread apart.

Example 2 (Excel example).Excel can compute standard deviation with the STDEV command. We can

see how the deviation changes for data sets with larger and smaller “spread.”

1.1. Which description of data is better: five-number summary or mean and standard de-

viation? The mean and standard deviation are always easier to compute; the five-number summary is

always more accurate. Use xand σwhen you have a symmetric distribution of data. Use the five-number

summary otherwise.

If the distribution is approximately symmetric, the median and the mean will be close, and the quartiles

will be about equally placed around the mean. In that case, the mean, and the standard deviation provide

a similar level of information.

2. First manipulations with normal distributions

Last time we were introduced to the all-important “Bell curve,” otherwise known as a normal distribu-

tion. There are only three numbers needed to describe a normal distribution:

The total number of data points. This is usually dealt with by “setting it to one and multiplying at the

end.”

µ, its mean, which is the center of the distribution, around which it is symmetric

and σ, its standard deviation (hinted at towards the end of the first lecture).

We will learn better what σis, but now let’s see how it can be used.

Theorem 3. In a normal distribution, with mean µand standard deviation σ,

(1) 68% of the observations fall within σof µ(within one standard deviation of the mean).

(2) 95% of the observations fall within 2σof µ(within two standard deviations of the mean).

(3) 99.7% of the observations fall within 3σof µ(within 3 standard deviations of the mean).

MATH 243, Lecture 3: Standard Deviation and Normal Distributions - Prof. Thomas Bell, Study notes of Probability and Statistics