Download Confidence Intervals and Hypothesis testing | STAT 30100 and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity! 1 Chapter 6 Confidence Intervals and Hypothesis Testing Why do we even bother analyzing data? We want to draw conclusions from the data. Why can’t we just accept our sample mean or sample proportion as the official mean or proportion for the population? Every time we estimate the statistics ˆ,x p (sample mean and sample proportion), we get a different answer due to sampling variability. Two most common types of formal statistical inference: • Confidence Intervals: when we want to estimate a population parameter • Significance Tests: when we want to assess the evidence provided by the data in favor of some claim about the population (yes/no question about the population) Confidence Intervals allow us to estimate a range of values for the population mean or population proportion. The true mean or proportion for the population exists and is a fixed number, but we just don’t know what it is. Using our sample statistic, we can create a “net” to give us an estimate of where to expect the population parameter to be. Confidence interval = net Population parameter = invisible, stationary butterfly We don’t know exactly where the butterfly is, but from our sample, we have a pretty good estimate of the location. 2 If you increase the sample size (n), you decrease the size of your “net” (or your margin of error). If you increase your confidence level (C), then you increase the size of your “net” (or your margin of error). If we take a single sample, our single confidence interval “net” may or may not include the population parameter. However if we take many samples of the same size and create a confidence interval from each sample statistic, over the long run 95% of our confidence intervals will contain the true population parameter (if we are using a 95% confidence level). 5 Examples: 1. A questionnaire of drinking habits was given to a random sample of fraternity members, and each student was asked to report the # of beers he had drunk in the past month. The sample of 30 students resulted in an average of 22 beers with standard deviation of 9 beers. a) Give a 90% confidence interval for the mean number of beers drunk by fraternity members in the past month. b) Is it true that 90% of the fraternity members each month drink the number of beers that lie in the interval you found in part (a)? Explain your answer. c) What is the margin of error for the 90% confidence interval? d) How many students should you sample if you want a margin of error of 1 for a 90% confidence interval? 6 2. A sample of 12 STAT 301 students yields the following Exam 1 scores: 78 62 99 85 94 53 88 90 86 92 75 92 Assume that the population standard deviation is 10. The sample mean can be calculated using SPSS or calculator to be 82.83. (Note: Do NOT use any SPSS confidence intervals—they are good only for Chapter 7, not this type of CI. You must get these Z confidence intervals by hand.) a) Find the 90% confidence interval for the mean score µ for STAT 301 students. b) Find the 95% confidence interval. c) Find the 99% confidence interval. d) How do the margins of error in (b), (c), and (d) change as the confidence level increases? Why? 7 Hypothesis Testing The 4 steps common to all tests of significance: 1. State the null hypothesis H0 and the alternative hypothesis Ha. 2. Calculate the value of the test statistic. 3. Draw a picture of what Ha looks like, and find the P-value. 4. State your conclusion about the data in a sentence, using the P-value and/or comparing the P-value to a significance level for your evidence. STEP 1: State the null hypothesis H0 and the alternative hypothesis Ha. To do a significance test, you need 2 hypotheses: • H0, Null Hypothesis: the statement being tested, usually phrased as “no effect” or “no difference”. • Ha, Alternative Hypothesis: the statement we hope or suspect is true instead of H0. Hypotheses always refer to some population or model. Not to a particular outcome. Hypotheses can be one-sided or two-sided. • One-sided hypothesis: covers just part of the range for your parameter H0: µ = 10 OR H0: µ = 10 Ha: µ > 10 Ha: µ < 10 10 Z-Test for a Population Mean To test the hypothesis H0: µ = µ0 based on an SRS of size n from a population with unknown mean µ and known standard deviation σ, • compute the test statistic: 00 / x Z n µ σ −= • the P-values for a test of H0 against: These P-values are exact if the population is normally distributed, and are approximately correct for large n in other cases. Ha: µ > µ0 is P( Z ≥ Z0) Ha: µ < µ0 is P( Z ≤ Z0 ) Ha: µ ≠ µ0 is 2* P( Z ≥ | Z0| ) 11 Examples: 1. Last year the government made a claim that the average income of the American people was $33,950. However, a sample of 50 people taken recently showed an average income of $34,076 with a population standard deviation of $324. Is the government’s estimate too low? Conduct a significance test to see if the true mean is more than the reported average. Use an α=0.01. 2. An environmentalist collects a liter of water from 45 different locations along the banks of a stream. He measures the amount of dissolved oxygen in each specimen. The mean oxygen level is 4.62 mg, with the overall standard deviation of 0.92. A water purifying company claims that the mean level of oxygen in the water is 5 mg. Conduct a hypothesis test with α=0.001 to determine whether the mean oxygen level is less than 5 mg. 12 How does α relate to confidence intervals? If you have a 2-sided test, and if the α and confidence level add to 100%, you can reject H0 if µ0 (the number you were checking) is not in the confidence interval. Example: An agro-economist examines the cellulose content of a variety of alfalfa hay. Suppose that the cellulose content in the population has a standard deviation of 8 mg. A sample of 15 cuttings has a mean cellulose content of 145 mg. a) A previous study claimed that the mean cellulose content was 140 mg. Perform a hypothesis test to determine if the mean cellulose content is different from 140 mg if α=0.05. b) Find a 95% confidence interval for the mean cellulose content. c) Now try the test from part a again using the confidence interval from part b to do the hypothesis test. (The result should be the same.)