Estimating Means: Determining Sample Size and Confidence Intervals, Exams of Data Analysis & Statistical Methods

The concept of estimating means and the importance of determining how close to the population mean we need to be and how sure we need to be in our sample. It also introduces the concept of confidence intervals and the use of critical values to construct them. An example of calculating a 95% confidence interval for the mean body temperature in utica, ny.

Typology: Exams

Pre 2010

Uploaded on 08/09/2009

koofers-user-0bm
koofers-user-0bm ๐Ÿ‡บ๐Ÿ‡ธ

10 documents

1 / 7

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STA100 Lecture16
1 | P a g e
Text: Section 8.1
Estimating Means
It is human nature to try to put things into context. Whenever I give an exam, the first
question is always โ€œWhat was the average?โ€ (This is even from people who should
know better โ€“including me!)
People like to know the average value for any number of situations, such as blood
pressure, cholesterol, temperature, bowling scores, etc. For instance, suppose you
wanted to know what the average body temperature in Utica, NY is. What would you
do?
If you said, โ€œGo to the internet and look it upโ€ then, good luck. If you said, โ€œForm a
simple random sample of Uticans and then use the sample average as a proxy for the
populationโ€ then, congratulations! Youโ€™ve been paying attention.
Here, however, is where a little complication comes in. Most of us believe there is such
a thing as โ€œmean body temperature of an Uticanโ€ even though we realize the concept is
a little fuzzy. For instance, define Utican. Also, shouldnโ€™t we break this up into age or
gender categories to have something meaningful? One could go on.
More technically, if we sample with a sample size of ๐‘›=10 this is certainly better than a
sample size of ๐‘›= 5. A sample size of ๐‘›=20 is better than one of ๐‘›=10, and so on.
How large a sample should we take? This question is easy to overlook until you do
some practical work on your own. Once you do, you realize that you need to confront
the following issues:
1. How close to the unknown population mean ๐œ‡๐‘ก๐‘’๐‘š๐‘ do you need to be in order for
your efforts to be worthwhile? For instance, Iโ€™m pretty sure before I start that the
average will be somewhere near 98.6ยฐ๐น. Do I need to be within 1 degree? 0.1
degree? 0.01 degree?
pf3
pf4
pf5

Partial preview of the text

Download Estimating Means: Determining Sample Size and Confidence Intervals and more Exams Data Analysis & Statistical Methods in PDF only on Docsity!

Text: Section 8.

Estimating Means It is human nature to try to put things into context. Whenever I give an exam, the first question is always โ€œWhat was the average?โ€ (This is even from people who should know better โ€“ including me!)

People like to know the average value for any number of situations, such as blood pressure, cholesterol, temperature, bowling scores, etc. For instance, suppose you wanted to know what the average body temperature in Utica, NY is. What would you do?

If you said, โ€œGo to the internet and look it upโ€ then, good luck. If you said, โ€œForm a simple random sample of Uticans and then use the sample average as a proxy for the populationโ€ then, congratulations! Youโ€™ve been paying attention.

Here, however, is where a little complication comes in. Most of us believe there is such a thing as โ€œmean body temperature of an Uticanโ€ even though we realize the concept is a little fuzzy. For instance, define Utican. Also, shouldnโ€™t we break this up into age or gender categories to have something meaningful? One could go on.

More technically, if we sample with a sample size of ๐‘› = 10 this is certainly better than a sample size of ๐‘› = 5. A sample size of ๐‘› = 20 is better than one of ๐‘› = 10, and so on. How large a sample should we take? This question is easy to overlook until you do some practical work on your own. Once you do, you realize that you need to confront the following issues:

  1. How close to the unknown population mean ๐œ‡๐‘ก๐‘’๐‘š๐‘ do you need to be in order for your efforts to be worthwhile? For instance, Iโ€™m pretty sure before I start that the average will be somewhere near 98.6 ยฐ ๐น. Do I need to be within 1 degree? 0. degree? 0.01 degree?
  1. How sure do I need to be that my sample mean ๐‘ฅ will be within say, 0. degrees? 90% sure? 95% sure? 99% sure?

The two issues we need to face, then, are (1) How close and (2) How sure.

There is another issue which is โ€œhiding in plain sightโ€ from us. How will we calculate our average? Common sense says to take all our sample temperatures, add them up, and divide by the sample size. But thatโ€™s not the only possibility. Why not just go halfway between the largest and the smallest? Why not use the median? If you take more courses in statistics you will think about how we choose our estimators. (Iโ€™ll just mention in passing that Iโ€™m dealing with this on a project right now- I wasnโ€™t able to use Maximum Likelihood Estimation and instead had to develop Method of Moment estimators.) The point is that life isnโ€™t always gift wrapped.

STA100, however, is always gift wrapped and weโ€™ll just use the sample mean as a stand in for the population mean.

Getting back to temperatures, letโ€™s suppose that human temperatures are normally distributed with a standard deviation of ๐œŽ๐‘ก๐‘’๐‘š๐‘ = 0.74โˆ˜๐น. Note that a terrific site for temperature data is:

http://www.amstat.org/publications/jse/v4n2/datasets.shoemaker.html

Take some time to read the paper there. It will set us up for the rest of the course.

Suppose you sample 20 people and find the following temperatures: 98.4 97.2 98.7 99.4 97.7 98.8 99.1 97. 97.1 98.6 97.9 98.7 98.8 98.7 99.2 98. 99.1 97.3 98.2 98.

Iโ€™m not sure that is a great help. This interval is much too wide. Letโ€™s think about it like this: How wide an interval would you need to construct in order to be 95% sure you have captured ๐๐’•๐’†๐’Ž๐’‘?

Since we know about the sampling distribution of the sample mean, assume the population of human temperatures is normally distributed. Then the sample means will also follow a normal distribution and we can construct our ๐‘ง statistic.

Looking up from our ๐‘ง table I see that the ๐‘ง value -1.96 has an area to the left of. and the ๐‘ง value 1.96 has an area to left of 0.9750 and consequently an area to the right of 0.0250.

Make sure you color in the โ€œtail areasโ€ and label them with areas of 0.0250. The reason I chose these values is that between the black lines we must have an area of 0.95. Make sure you see this before moving on- it is crucial for what follows. The rest is just algebra.

Since we now know that

(^0) -4 -3 -2 -1 0 1 2 3 4

we can substitute in for ๐‘ง and get

If you are comfortable working with inequalities you can push terms around to get

๐‘ƒ โˆ’1.96 ๐œŽ^ ๐‘› < ๐‘ฅ โˆ’ ๐œ‡ < 1.96 ๐œŽ^ ๐‘› = 0.

And then

๐‘ƒ ๐‘ฅ โˆ’ 1.96 ๐œŽ^ ๐‘› < ๐œ‡ < ๐‘ฅ + 1.96 ๐œŽ^ ๐‘› = 0.

This is what we were looking for. It says that when we come to a population and sample it is 95% likely that our sample mean will be such that if we add and subtract 1.96 ๐œŽ^ ๐‘› we will capture the population mean ๐œ‡๐‘ก๐‘’๐‘š๐‘.

Come back to our example. We had ๐‘ฅ = 98.335ยฐ๐น on a sample of size ๐‘› = 20 and I told you we could use ๐œŽ๐‘ก๐‘’๐‘š๐‘ = 0.74โˆ˜๐น. So if you want to be 95% confident of capturing

๐œ‡๐‘ก๐‘’๐‘š๐‘ on an interval, you should take your interval to be ๐‘ฅ โˆ’ 1.96 ๐œŽ^ ๐‘› < ๐œ‡๐‘ก๐‘’๐‘š๐‘ < ๐‘ฅ + 1.96 ๐œŽ^ ๐‘›

or

98.335 โˆ’ 1.96 0.74 20 < ๐œ‡๐‘ก๐‘’๐‘š๐‘ < 98.335 + 1.96 0.74 20

or 98.0107 < ๐œ‡๐‘ก๐‘’๐‘š๐‘ < 98.

We interpret all this in English by saying that our procedure for estimating ๐œ‡ will capture ๐œ‡ in the interval we build as ๐‘ฅ โˆ“ 1.96 ๐œŽ^ ๐‘› 95% of the time.

Note that the Central Limit Theorem tells us that when ๐‘› โ‰ฅ 30 the sampling distribution of the sample means is approximately normally distributed for most commonly encountered populations. So, we have the following:

For a general distribution (not necessarily normal), if ๐‘› โ‰ฅ 30 we may form a confidence interval as ๐‘ฅ โˆ’ ๐‘ง๐‘ ๐œŽ^ ๐‘› < ๐œ‡๐‘ก๐‘’๐‘š๐‘ < ๐‘ฅ + ๐‘ง๐‘ ๐œŽ^ ๐‘›

Note that we will often use this formula for large sample sizes even when the population standard deviation ๐œŽ is not known. In this case we just use the sample standard deviation ๐‘  in place of ๐œŽ and know we are degrading our result a little.

First Presentation Example of the week: Due to a variation in laboratory techniques, impurities in materials, and other unknown factors, the results of an experiment in a chemistry laboratory will not always yield the same numerical answer. In an electrolysis experiment, a class measured the amount of copper precipitated from a saturated solution of copper sulfate over a 30 minute period. The n = 42 students acquired a sample mean and standard deviation equal to 0.15 and 0.01 mole respectively. Find a 90% confidence interval for the mean amount of copper precipitated from the solution over the period of time.