








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Sorola; Class: Elementary Statistical Methods; Subject: STAT-Statistics; University: Purdue University - Main Campus; Term: Unknown 1989;
Typology: Study notes
1 / 14
This page cannot be seen from the preview
Don't miss anything!









Chapter 6 Confidence Intervals and Hypothesis Testing
Why do we even bother analyzing data? We want to draw conclusions from the data.
Why can’t we just accept our sample mean or sample proportion as the official mean or proportion for the population? Every time we estimate
the statistics x^ , p ˆ^ (sample mean and sample proportion), we get a
different answer due to sampling variability.
Two most common types of formal statistical inference:
Confidence Intervals allow us to estimate a range of values for the population mean or population proportion.
The true mean or proportion for the population exists and is a fixed number, but we just don’t know what it is. Using our sample statistic, we can create a “net” to give us an estimate of where to expect the population parameter to be.
Confidence interval = net
Population parameter = invisible, stationary butterfly
We don’t know exactly where the butterfly is, but from our sample, we have a pretty good estimate of the location.
If you increase the sample size ( n ) , you decrease the size of your “net” (or your margin of error).
If you increase your confidence level ( C ) , then you increase the size of your “net” (or your margin of error).
If we take a single sample, our single confidence interval “net” may or may not include the population parameter.
However if we take many samples of the same size and create a confidence interval from each sample statistic, over the long run 95% of our confidence intervals will contain the true population parameter (if we are using a 95% confidence level).
What if your margin of error is too large? Here are ways to reduce it:
Sample Size, n, for Desired Margin of Error, m:
σ =
Note that it is the sample size, n , that influences the margin of error. The population size has nothing to do with it.
Be careful!!!! You can only use the formula x z * x n
± σ under certain
circumstances:
Examples:
a) Give a 90% confidence interval for the mean number of beers drunk by fraternity members in the past month.
b) Is it true that 90% of the fraternity members each month drink the number of beers that lie in the interval you found in part (a)? Explain your answer.
c) What is the margin of error for the 90% confidence interval?
d) How many students should you sample if you want a margin of error of 1 for a 90% confidence interval?
Hypothesis Testing
The 4 steps common to all tests of significance:
STEP 1: State the null hypothesis H 0 and the alternative hypothesis Ha.
To do a significance test, you need 2 hypotheses:
Hypotheses always refer to some population or model. Not to a particular outcome.
Hypotheses can be one-sided or two-sided.
Even though Ha is what we hope or believe to be true, our test gives evidence for or against H 0 only.
We never prove H 0 true, we can only state whether we have enough evidence to reject H 0 (which is evidence in favor of Ha , but not proof that Ha is true) or that we don’t have enough evidence to reject H 0****.
Example (Exercise 6.37, p. 418): Each of the following situations
appropriate null hypothesis H 0 and alternative hypothesis Ha in each case:
a. Census Bureau data shows that the mean household income in the area served by a shopping mall is $72,500 per year. A market research firm questions shoppers at the mall to find out whether the mean household income of mall shoppers is higher than that of the general population.
b. Last year, your company’s service technicians took an average of 1.8 hours to respond to trouble calls from business customers who had purchased service contracts. Do this year’s data show a different average response time?
Z -Test for a Population Mean
Z x n
μ σ
=^ −
These P -values are exact if the population is normally distributed, and are approximately correct for large n in other cases.
Ha: μ > μ 0 is P( Z ≥ Z 0 )
Ha: μ < μ 0 is P( Z ≤ Z 0 )
Ha: μ ≠ μ 0 is 2* P( Z ≥ | Z 0 | )
Examples:
Annual Drinking Water Quality Report, 2004, Town of Brookston, IN
“I’m pleased to report that our drinking water is safe and meets federal and state requirements.”
Test Results (MCL is the maximum contaminant level, the highest level of a contaminant that is allowed in drinking water.)
Contaminant Violation Y/N
Level Detected
Unit measurement
Beta/photon emitters
N (^) 2.1 ± 3.2 mrem/yr 4
Alpha emitters
N 0 ± 1.6 pCi/l 15
Barium N 0.216 ppm 2 Copper N 0.039 to
ppm 1.
Fluoride N 0.01 ppm 4 Sodium N 0.0 ppm N/A
One of these violation reports should actually be a “yes” instead of a “no.” Which one is it and why? What hypotheses go along with these confidence intervals?
Note: When I called the town of Brookston office to ask them about this, the water manager called the state EPA office to get more information. What they told him was that, yes, technically I was correct, but that they don’t use the confidence intervals that are reported. Apparently these are the FEDERAL EPA rules. They only use the mean. I tried to get sample size or other information, but I wasn’t able to learn anything more.
P -values can be more informative than a reject/do not reject H 0 based on
stronger.
have to use—it’s just the most common. There’s nothing particularly special about that level.
In a large sample, even tiny deviations from the null hypothesis can be important.
If we fail to reject H 0 , it may be because H 0 is true or because our sample size is insufficient to detect the alternative.
Plot your data and look at your P -value to determine your conclusions. Could outliers be part of the problem?
A confidence interval actually estimates the size of an effect rather than simply asking if it is too large to reasonably occur by chance alone.
You must have a well-designed experiment in order for statistical inference to work. Randomization is important.