Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Estimating Means: Determining Sample Size and Confidence Intervals, Exams of Data Analysis & Statistical Methods

State University of New York Polytechnic - Utica-Rome Data Analysis & Statistical Methods

The concept of estimating means and the importance of determining how close to the population mean we need to be and how sure we need to be in our sample. It also introduces the concept of confidence intervals and the use of critical values to construct them. An example of calculating a 95% confidence interval for the mean body temperature in utica, ny.

Typology: Exams

Pre 2010

Uploaded on 08/09/2009

koofers-user-0bm 🇺🇸

10 documents

1 / 7

This page cannot be seen from the preview

Don't miss anything!

STA100 Lecture16

1 | P a g e

Text: Section 8.1

Estimating Means

It is human nature to try to put things into context. Whenever I give an exam, the first

question is always “What was the average?” (This is even from people who should

know better –including me!)

People like to know the average value for any number of situations, such as blood

pressure, cholesterol, temperature, bowling scores, etc. For instance, suppose you

wanted to know what the average body temperature in Utica, NY is. What would you

do?

If you said, “Go to the internet and look it up” then, good luck. If you said, “Form a

simple random sample of Uticans and then use the sample average as a proxy for the

population” then, congratulations! You’ve been paying attention.

Here, however, is where a little complication comes in. Most of us believe there is such

a thing as “mean body temperature of an Utican” even though we realize the concept is

a little fuzzy. For instance, define Utican. Also, shouldn’t we break this up into age or

gender categories to have something meaningful? One could go on.

More technically, if we sample with a sample size of 𝑛=10 this is certainly better than a

sample size of 𝑛= 5. A sample size of 𝑛=20 is better than one of 𝑛=10, and so on.

How large a sample should we take? This question is easy to overlook until you do

some practical work on your own. Once you do, you realize that you need to confront

the following issues:

1. How close to the unknown population mean 𝜇𝑡𝑒𝑚𝑝 do you need to be in order for

your efforts to be worthwhile? For instance, I’m pretty sure before I start that the

average will be somewhere near 98.6°𝐹. Do I need to be within 1 degree? 0.1

degree? 0.01 degree?

Discover Exams of Data Analysis & Statistical Methods State University of New York Polytechnic - Utica-Rome

Partial preview of the text

Download Estimating Means: Determining Sample Size and Confidence Intervals and more Exams Data Analysis & Statistical Methods in PDF only on Docsity!

Text: Section 8.

Estimating Means It is human nature to try to put things into context. Whenever I give an exam, the first question is always “What was the average?” (This is even from people who should know better – including me!)

People like to know the average value for any number of situations, such as blood pressure, cholesterol, temperature, bowling scores, etc. For instance, suppose you wanted to know what the average body temperature in Utica, NY is. What would you do?

If you said, “Go to the internet and look it up” then, good luck. If you said, “Form a simple random sample of Uticans and then use the sample average as a proxy for the population” then, congratulations! You’ve been paying attention.

Here, however, is where a little complication comes in. Most of us believe there is such a thing as “mean body temperature of an Utican” even though we realize the concept is a little fuzzy. For instance, define Utican. Also, shouldn’t we break this up into age or gender categories to have something meaningful? One could go on.

More technically, if we sample with a sample size of 𝑛 = 10 this is certainly better than a sample size of 𝑛 = 5. A sample size of 𝑛 = 20 is better than one of 𝑛 = 10, and so on. How large a sample should we take? This question is easy to overlook until you do some practical work on your own. Once you do, you realize that you need to confront the following issues:

How close to the unknown population mean 𝜇𝑡𝑒𝑚𝑝 do you need to be in order for your efforts to be worthwhile? For instance, I’m pretty sure before I start that the average will be somewhere near 98.6 ° 𝐹. Do I need to be within 1 degree? 0. degree? 0.01 degree?

How sure do I need to be that my sample mean 𝑥 will be within say, 0. degrees? 90% sure? 95% sure? 99% sure?

The two issues we need to face, then, are (1) How close and (2) How sure.

There is another issue which is “hiding in plain sight” from us. How will we calculate our average? Common sense says to take all our sample temperatures, add them up, and divide by the sample size. But that’s not the only possibility. Why not just go halfway between the largest and the smallest? Why not use the median? If you take more courses in statistics you will think about how we choose our estimators. (I’ll just mention in passing that I’m dealing with this on a project right now- I wasn’t able to use Maximum Likelihood Estimation and instead had to develop Method of Moment estimators.) The point is that life isn’t always gift wrapped.

STA100, however, is always gift wrapped and we’ll just use the sample mean as a stand in for the population mean.

Getting back to temperatures, let’s suppose that human temperatures are normally distributed with a standard deviation of 𝜎𝑡𝑒𝑚𝑝 = 0.74∘𝐹. Note that a terrific site for temperature data is:

http://www.amstat.org/publications/jse/v4n2/datasets.shoemaker.html

Take some time to read the paper there. It will set us up for the rest of the course.

Suppose you sample 20 people and find the following temperatures: 98.4 97.2 98.7 99.4 97.7 98.8 99.1 97. 97.1 98.6 97.9 98.7 98.8 98.7 99.2 98. 99.1 97.3 98.2 98.

I’m not sure that is a great help. This interval is much too wide. Let’s think about it like this: How wide an interval would you need to construct in order to be 95% sure you have captured 𝝁𝒕𝒆𝒎𝒑?

Since we know about the sampling distribution of the sample mean, assume the population of human temperatures is normally distributed. Then the sample means will also follow a normal distribution and we can construct our 𝑧 statistic.

Looking up from our 𝑧 table I see that the 𝑧 value -1.96 has an area to the left of. and the 𝑧 value 1.96 has an area to left of 0.9750 and consequently an area to the right of 0.0250.

Make sure you color in the “tail areas” and label them with areas of 0.0250. The reason I chose these values is that between the black lines we must have an area of 0.95. Make sure you see this before moving on- it is crucial for what follows. The rest is just algebra.

Since we now know that

(^0) -4 -3 -2 -1 0 1 2 3 4

we can substitute in for 𝑧 and get

If you are comfortable working with inequalities you can push terms around to get

𝑃 −1.96 𝜎^ 𝑛 < 𝑥 − 𝜇 < 1.96 𝜎^ 𝑛 = 0.

And then

𝑃 𝑥 − 1.96 𝜎^ 𝑛 < 𝜇 < 𝑥 + 1.96 𝜎^ 𝑛 = 0.

This is what we were looking for. It says that when we come to a population and sample it is 95% likely that our sample mean will be such that if we add and subtract 1.96 𝜎^ 𝑛 we will capture the population mean 𝜇𝑡𝑒𝑚𝑝.

Come back to our example. We had 𝑥 = 98.335°𝐹 on a sample of size 𝑛 = 20 and I told you we could use 𝜎𝑡𝑒𝑚𝑝 = 0.74∘𝐹. So if you want to be 95% confident of capturing

𝜇𝑡𝑒𝑚𝑝 on an interval, you should take your interval to be 𝑥 − 1.96 𝜎^ 𝑛 < 𝜇𝑡𝑒𝑚𝑝 < 𝑥 + 1.96 𝜎^ 𝑛

or

98.335 − 1.96 0.74 20 < 𝜇𝑡𝑒𝑚𝑝 < 98.335 + 1.96 0.74 20

or 98.0107 < 𝜇𝑡𝑒𝑚𝑝 < 98.

We interpret all this in English by saying that our procedure for estimating 𝜇 will capture 𝜇 in the interval we build as 𝑥 ∓ 1.96 𝜎^ 𝑛 95% of the time.

Note that the Central Limit Theorem tells us that when 𝑛 ≥ 30 the sampling distribution of the sample means is approximately normally distributed for most commonly encountered populations. So, we have the following:

For a general distribution (not necessarily normal), if 𝑛 ≥ 30 we may form a confidence interval as 𝑥 − 𝑧𝑐 𝜎^ 𝑛 < 𝜇𝑡𝑒𝑚𝑝 < 𝑥 + 𝑧𝑐 𝜎^ 𝑛

Note that we will often use this formula for large sample sizes even when the population standard deviation 𝜎 is not known. In this case we just use the sample standard deviation 𝑠 in place of 𝜎 and know we are degrading our result a little.

First Presentation Example of the week: Due to a variation in laboratory techniques, impurities in materials, and other unknown factors, the results of an experiment in a chemistry laboratory will not always yield the same numerical answer. In an electrolysis experiment, a class measured the amount of copper precipitated from a saturated solution of copper sulfate over a 30 minute period. The n = 42 students acquired a sample mean and standard deviation equal to 0.15 and 0.01 mole respectively. Find a 90% confidence interval for the mean amount of copper precipitated from the solution over the period of time.

Estimating Means: Determining Sample Size and Confidence Intervals, Exams of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Estimating Means: Determining Sample Size and Confidence Intervals and more Exams Data Analysis & Statistical Methods in PDF only on Docsity!