From Histograms to Density Curves: Understanding Normal Distributions - Prof. N. Phillips, Study notes of Probability and Statistics

A lecture file from math 243, discussing the transformation of histograms into density curves, focusing on normal distributions. It covers topics such as rescaling axes, understanding density curves, and the relationship between median, mean, and standard deviation.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-pda
koofers-user-pda 🇺🇸

10 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Math 243: Lecture File 2
N. Christopher Phillips
2 April 2009
N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 1 / 48
From histograms to density curves
Data: 12 17 21 23 24 26 29 31 31 39
Histogram with class width 10:
10
20
30
40
1
2
3
4
5
6
Area of red bar: 10 ·2 = 20.
Total area: 10 ·2 + 10 ·5 + 10 ·3 = 100.
Fraction of the data covered by the red bar: 20/100 = 0.2.
N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 2 / 48
Rescale vertical axis so that total area is 1
10
20
30
40
1
2
3
4
5
6
10
20
30
40
0.01
0.02
0.03
0.04
0.05
0.06
N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 3 / 48
Rescale vertical axis so that total area is 1 (continued)
10
20
30
40
0.01
0.02
0.03
0.04
0.05
0.06
Area of red bar: 10 ·0.02 = 0.2.
Total area: 10 ·0.02 + 10 ·0.05 + 10 ·0.03 = 1.
The fraction of the data covered by the red bar is still 0.2,but this is now
just the area of the red bar.
N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 4 / 48
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download From Histograms to Density Curves: Understanding Normal Distributions - Prof. N. Phillips and more Study notes Probability and Statistics in PDF only on Docsity!

Math 243: Lecture File 2

N. Christopher Phillips

2 April 2009

N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 1 / 48

From histograms to density curves

Data: 12 17 21 23 24 26 29 31 31 39 Histogram with class width 10:

10 20 30 40

1

2

3

4

5

6

Area of red bar: 10 · 2 = 20. Total area: 10 · 2 + 10 · 5 + 10 · 3 = 100. Fraction of the data covered by the red bar: 20/100 = 0. 2.

N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 2 / 48

Rescale vertical axis so that total area is 1

10 20 30 40

1

2

3

4

5

6

10 20 30 40

Rescale vertical axis so that total area is 1 (continued)

10 20 30 40

Area of red bar: 10 · 0 .02 = 0. 2. Total area: 10 · 0 .02 + 10 · 0 .05 + 10 · 0 .03 = 1.

The fraction of the data covered by the red bar is still 0. 2 , but this is now just the area of the red bar.

Data: 12 17 21 23 24 26 29 31 31 39

10 20 30 40

1

2

3

4

5

6

10 20 30 40

N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 5 / 48

Data: 12 17 21 23 24 26 29 31 31 39 Histogram with class width 5:

10 20 30 40

Area of red section: 5 · 1 + 5 · 1 = 10. Total area: 5 · 1 + 5 · 1 + 5 · 3 + · · · = 50. Fraction of the data covered by the red section: 10/50 = 0. 2.

N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 6 / 48

Rescale vertical axis so that total area is 1

10 20 30 40

10 20 30 40

Rescale vertical axis so that total area is 1 (continued)

10 20 30 40

Area of red section: 5 · 0 .02 + 5 · 0 .02 = 0. 2. Total area: 5 · 0 .02 + 5 · 0 .02 + 5 · 0 .06 + · · · = 1.

The fraction of the data covered by the red section is still 0. 2 , but this is now just the area of the red section.

100 classes.

  • 5 0 5 10 15

100 000

200 000

300 000

400 000

N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 13 / 48

300 classes.

  • 5 0 5 10

20 000

40 000

60 000

80 000

100 000

N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 14 / 48

1000 classes.

  • 5 0 5 10

10 000

20 000

30 000

40 000

One way to think about density curves

When you see a density curve, imagine that it is a histogram in which the classes are so narrow that each individual bar in the histogram is too small to see.

An example of a density curve

Here is a density curve:

1 2 3 4 5

Conditions: The curve is always on or above the horizontal axis. The total area under the curve is 1. (If you have seen Math 242: The integral of the function is 1.) N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 17 / 48

Meaning of the density curve

A density curve represents the distribution of a very large data set. (In principle, the data set should be infinite; the curve is only an approximation for a large finite data set.)

N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 18 / 48

Meaning of the density curve (continued)

1 2 3 4 5

The shaded area is the fraction of the data that lies between 2 and 3.

Meaning of the density curve (continued)

1 2 3 4 5

The shaded area is the fraction of the data that lies between 1 and 2 or between 3 and 3. 5.

  • 4 - 2 2 4
  • 5 5 10 15

N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 25 / 48

From histograms to density curves, for the normal

distribution.

The following histograms show larger and larger numbers of data points chosen randomly from the standard Normal distribution, sometimes with the Normal curve superimposed. Observe that small numbers of normally distributed data points are somewhat irregular, but very large numbers are very regular.

N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 26 / 48

100 observations.

1000 observations.

104 observations.

N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 29 / 48

106 observations, curve not shown.

-4 -2 0 2 4

10000

20000

30000

40000

N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 30 / 48

106 observations, curve shown.

-4 -2 2 4

10000

20000

30000

40000

107 observations, curve not shown.

-4 -2 0 2 4

20000

40000

60000

80000

Comparing Normal distributions (continued)

Jane Wang’s z-score was about 1. 24561.

Math ACT scores are roughly N(20. 7 , 5 .0). Quincy Michaels got 27 on the math ACT. Assuming the tests measure the same thing, did he do better or worse than Jane Wang?

Compare z-scores: his z-score is

z =

x − μ σ

So he did slightly better.

N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 37 / 48

Comparing Normal distributions (continued)

John Doe’s z-score on the Math SAT (roughly N(518, 114)) was − 2. What was his actual score?

In z =

x − μ σ

solve for x to get x = μ + zσ. So John Doe’s score was

x = μ + zσ = 518 − 2 · 114 = 290.

(Compare: 2 standard deviations below the mean.)

N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 38 / 48

The rule of thumb (“68–95–99.7 rule”)

In the distribution N(μ, σ) (Normal with mean μ and standard deviation σ), About 68% of the observations are within one standard deviation of the mean. About 95% of the observations are within two standard deviations of the mean. About 99.7% of the observations are within three standard deviations of the mean.

Example: Math SAT scores are roughly N(518, 114). What fraction of Math SAT scores are in the range (404, 632)?

The interval is (μ − σ, μ + σ), so the answer is about 68%.

Note that this idealized model assumes scores of exactly 404 and 632 do not occur.

The rule of thumb (“68–95–99.7 rule”) (continued)

Example: Math SAT scores are roughly N(518, 114). What fraction of Math SAT scores are larger than 746?

z = x − μ σ

so we want to know what fraction of scores are more than two standard deviations above the mean.

The rule of thumb says that about 95% are within two standard deviations of the mean. So about 100% − 95% = 5% are more than two standard deviations away from the mean. By symmetry, half of these are more than two standard deviations above the mean (and the other half are more than two standard deviations below from the mean). So the answer is about 2 .5%.

The rule of thumb (“68–95–99.7 rule”) (continued)

400 600 800

By the rule of thumb, the unshaded region has area about 0. 95. So the two shaded regions together have area about 0. 05. We are interested in the one on the right, which has half the area, or area about 0. 025.

N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 41 / 48

Using Table A

Table A (pages 684 and 685) gives the area under the standard Normal curve to the left of (below) the specified value of z. Example:

  • 4 - 2 2 4

The shaded region is at − 1 .32 and below. Look at the row in Table A labelled “− 1 .3” and the column labelled “0.02”, and read off the number 0 .0934 for the shaded area. N. Christopher Phillips () Math 243: Lecture File 2 2 April 2009 42 / 48

Using Table A (continued)

Note: One can do problems like this directly with most calculators. However, the standardization idea is important anyway. Table A will be provided on exams (without the pictures).

Example: Math SAT scores are roughly N(518, 114). What fraction of Math SAT scores are larger than 746?

z = x − μ σ

as before. Look at the row in Table A labelled “2.0” and the column labelled “0.00”, and read off the number 0. 9772. This tells you that the fraction about 0.9772 of the data has z-scores less than 2. Therefore the fraction about 1 − 0 .9772 = 0. 0228 , or about 2.28%, has z-scores above 2. Thus, about 2.28% of math SAT scores are above 746.

The rule of thumb gave about 2.5%.

Using Table A (continued)

Example: Math SAT scores are roughly N(518, 114). What fraction of Math SAT scores are less than 600?

z =

x − μ σ

Look at the row in Table A labelled “0.7” and the column labelled “0.02”, and read off the number 0. 7642. This tells you that the fraction about 0 .7642 of the data has z-scores less than 0. 72. Thus, about 76.42% of math SAT scores are below 600.