



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The concept of estimating means and the importance of determining how close to the population mean we need to be and how sure we need to be in our sample. It also introduces the concept of confidence intervals and the use of critical values to construct them. An example of calculating a 95% confidence interval for the mean body temperature in utica, ny.
Typology: Exams
1 / 7
This page cannot be seen from the preview
Don't miss anything!




Text: Section 8.
Estimating Means It is human nature to try to put things into context. Whenever I give an exam, the first question is always โWhat was the average?โ (This is even from people who should know better โ including me!)
People like to know the average value for any number of situations, such as blood pressure, cholesterol, temperature, bowling scores, etc. For instance, suppose you wanted to know what the average body temperature in Utica, NY is. What would you do?
If you said, โGo to the internet and look it upโ then, good luck. If you said, โForm a simple random sample of Uticans and then use the sample average as a proxy for the populationโ then, congratulations! Youโve been paying attention.
Here, however, is where a little complication comes in. Most of us believe there is such a thing as โmean body temperature of an Uticanโ even though we realize the concept is a little fuzzy. For instance, define Utican. Also, shouldnโt we break this up into age or gender categories to have something meaningful? One could go on.
More technically, if we sample with a sample size of ๐ = 10 this is certainly better than a sample size of ๐ = 5. A sample size of ๐ = 20 is better than one of ๐ = 10, and so on. How large a sample should we take? This question is easy to overlook until you do some practical work on your own. Once you do, you realize that you need to confront the following issues:
The two issues we need to face, then, are (1) How close and (2) How sure.
There is another issue which is โhiding in plain sightโ from us. How will we calculate our average? Common sense says to take all our sample temperatures, add them up, and divide by the sample size. But thatโs not the only possibility. Why not just go halfway between the largest and the smallest? Why not use the median? If you take more courses in statistics you will think about how we choose our estimators. (Iโll just mention in passing that Iโm dealing with this on a project right now- I wasnโt able to use Maximum Likelihood Estimation and instead had to develop Method of Moment estimators.) The point is that life isnโt always gift wrapped.
STA100, however, is always gift wrapped and weโll just use the sample mean as a stand in for the population mean.
Getting back to temperatures, letโs suppose that human temperatures are normally distributed with a standard deviation of ๐๐ก๐๐๐ = 0.74โ๐น. Note that a terrific site for temperature data is:
http://www.amstat.org/publications/jse/v4n2/datasets.shoemaker.html
Take some time to read the paper there. It will set us up for the rest of the course.
Suppose you sample 20 people and find the following temperatures: 98.4 97.2 98.7 99.4 97.7 98.8 99.1 97. 97.1 98.6 97.9 98.7 98.8 98.7 99.2 98. 99.1 97.3 98.2 98.
Iโm not sure that is a great help. This interval is much too wide. Letโs think about it like this: How wide an interval would you need to construct in order to be 95% sure you have captured ๐๐๐๐๐?
Since we know about the sampling distribution of the sample mean, assume the population of human temperatures is normally distributed. Then the sample means will also follow a normal distribution and we can construct our ๐ง statistic.
Looking up from our ๐ง table I see that the ๐ง value -1.96 has an area to the left of. and the ๐ง value 1.96 has an area to left of 0.9750 and consequently an area to the right of 0.0250.
Make sure you color in the โtail areasโ and label them with areas of 0.0250. The reason I chose these values is that between the black lines we must have an area of 0.95. Make sure you see this before moving on- it is crucial for what follows. The rest is just algebra.
Since we now know that
(^0) -4 -3 -2 -1 0 1 2 3 4
we can substitute in for ๐ง and get
If you are comfortable working with inequalities you can push terms around to get
๐ โ1.96 ๐^ ๐ < ๐ฅ โ ๐ < 1.96 ๐^ ๐ = 0.
And then
๐ ๐ฅ โ 1.96 ๐^ ๐ < ๐ < ๐ฅ + 1.96 ๐^ ๐ = 0.
This is what we were looking for. It says that when we come to a population and sample it is 95% likely that our sample mean will be such that if we add and subtract 1.96 ๐^ ๐ we will capture the population mean ๐๐ก๐๐๐.
Come back to our example. We had ๐ฅ = 98.335ยฐ๐น on a sample of size ๐ = 20 and I told you we could use ๐๐ก๐๐๐ = 0.74โ๐น. So if you want to be 95% confident of capturing
๐๐ก๐๐๐ on an interval, you should take your interval to be ๐ฅ โ 1.96 ๐^ ๐ < ๐๐ก๐๐๐ < ๐ฅ + 1.96 ๐^ ๐
or
98.335 โ 1.96 0.74 20 < ๐๐ก๐๐๐ < 98.335 + 1.96 0.74 20
or 98.0107 < ๐๐ก๐๐๐ < 98.
We interpret all this in English by saying that our procedure for estimating ๐ will capture ๐ in the interval we build as ๐ฅ โ 1.96 ๐^ ๐ 95% of the time.
Note that the Central Limit Theorem tells us that when ๐ โฅ 30 the sampling distribution of the sample means is approximately normally distributed for most commonly encountered populations. So, we have the following:
For a general distribution (not necessarily normal), if ๐ โฅ 30 we may form a confidence interval as ๐ฅ โ ๐ง๐ ๐^ ๐ < ๐๐ก๐๐๐ < ๐ฅ + ๐ง๐ ๐^ ๐
Note that we will often use this formula for large sample sizes even when the population standard deviation ๐ is not known. In this case we just use the sample standard deviation ๐ in place of ๐ and know we are degrading our result a little.
First Presentation Example of the week: Due to a variation in laboratory techniques, impurities in materials, and other unknown factors, the results of an experiment in a chemistry laboratory will not always yield the same numerical answer. In an electrolysis experiment, a class measured the amount of copper precipitated from a saturated solution of copper sulfate over a 30 minute period. The n = 42 students acquired a sample mean and standard deviation equal to 0.15 and 0.01 mole respectively. Find a 90% confidence interval for the mean amount of copper precipitated from the solution over the period of time.