Central Limit Theorem: Computing Margins of Error and Confidence Intervals | Study notes Probability and Statistics

MATH 243, LECTURE 14

1. Using the Central Limit Theorem

We will repeat our main goal many times in many ways. Here’s an easy question to remember: how

does one compute the “margin of error” for a poll? How does Gallup know that 65% plus or minus 4% of

Americans like chocolate chip cookies? Do they really know that, anyways?

Remember from last time that we are trying to understand a parameter (like the true average of purchase

prices for cars in the U.S.) from a statistic (the average of say 1000 of those purchases picked at random.

The main conceptual turn: we think of a statistic as a random variable. After all, choosing 1000 car

purchases at random and taking (for example) the mean price is akin to rolling dice - unpredictable, but

over many (thousands) of random samples we expect to see the true mean purchase price on average.

The main - the only, really - theoretical basis we’ll see for statistical inference is the Central Limit

Theorem.

Theorem 1. The sampling distribution of means of random samples of size nfrom a population with

mean µand standard deviation σis approximately

N(µ, σ/√n)

when nis large.

Example 2. Suppose that the average price of a new car purchase is $24145 with a standard deviation

of $3615. Suppose you take a survey of 1000 car purchases. What is the probability that the average over

your survey is over $25000?

Example 3. If SAT scores are distributed normally according to N(1630,100), what is the chance of six

randomly sampled students having an average score above 1800? (If you keep on “sampling” students and

finding a larger average than 1800, what should you deduce?)

1.1. Example: Process Control. The Central Limit Theorem has many applications, since sampling

can be useful well beyond the realms of surveys and opinion polls.

Imagine a manufacturing process for, say, ball bearings. The bearings are supposed to be 10 mm in

diameter. When the manufacturing process is working correctly, they are distributed normally N(10, .7).

We can’t check every bearing. Every hour we take a sample of 10 bearings, and take the mean diameter.

The sample distribution of the means, xshould be N(10, .7/√10) = N(10, .221).

This means (by the 68-95-99.7 rule) 99.7% of the means will occur

10 −3(.221) < x < 10 + 3(.221)

9.337 < x < 10.663

We are alarmed if we see any xoutside of these limits, and suspect that our manufacturing process has

been disturbed.

We keep track with a xcontrol chart. This is a graph, with a mark each hour for the value of xfor that

hour.

The graph includes an upper control line 3 standard deviations above the mean, and a lower control

lines, 3 standard deviations below the mean.

Central Limit Theorem: Computing Margins of Error and Confidence Intervals, Study notes of Probability and Statistics