Download Calculating Confidence Intervals & Hypothesis Testing for Unknown Population Means - Prof. and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity! 1 Chapter 7 Section 7.1: Inference for the Mean of a Population Section 7.2: Comparing Two Means Learning goals for this chapter: Understand what inference is and why it is needed. Know that all inference techniques give us information about the population parameter. Explain what a confidence interval is and when it is needed. Calculate a confidence interval for the population mean when the population standard deviation is unknown. Know the assumptions that must be met for doing inference for the population mean when the population standard deviation is unknown (robustness) for 1- sample mean, matched pairs, and 2-sample comparison of means. Know how to write hypotheses, calculate a test statistic and P-value, and write conclusions in terms of the story. Draw Normal curve pictures to match the hypothesis test. Understand the logic of hypothesis testing and when a hypothesis test is needed. Use the confidence interval to perform a two-sided hypothesis test. Explain sampling variability and the difference between the population mean and the sample mean. Explain the difference between the population standard deviation and the sample standard deviation. Know which technique is most appropriate for a story: confidence interval, hypothesis test, or simple summary statistics. Know which inference technique is most appropriate for a story: 1-sample mean using Z, 1-sample mean using t, matched pairs, or 2-sample comparison of means. Interpret Normal quantile plots and histograms to determine whether the t procedures are appropriate. Know how to do all calculations (listed above) by hand with the t table and using SPSS. In Chapter 6, we knew the population standard deviation . Confidence interval for the population mean : * Xx z n Hypothesis test statistic for the population mean : 0 0 / x z n Used the distribution ~ ( , )x N n . 2 In Chapter 7, we don’t know the population standard deviation Use the sample standard deviation (s) Confidence interval for the population mean : * s x t n Hypothesis test statistic for the population mean : 0 / x t s n t distribution uses n-1 degrees of freedom. Sometimes you’ll see the symbol for standard error: ˆ x s n Using the t-distribution: Suppose that an SRS of size n is drawn from a N( , ) population. There is a different t distribution for each sample size, so t(k) stands for the t distribution with k degrees of freedom. Degrees of freedom = k = n – 1 = sample size – 1 As k increases, the t distribution looks more like the normal distribution (because as n increases, s ). t(k) distributions are symmetric about 0 and are bell shaped, they are just a bit wider than the normal distribution. Table shows upper tails only, so o if t* is negative, P(t < t*) = P(t > |t*|). o if you have a 2-sided test, multiply the P(t > |t*|) by 2 to get the area in both tails. o The Normal table showed lower tails only, so the t-table is backwards. Finding t* on the table: Start at the bottom line to get the right column for your confidence level, and then work up to the correct row for your degrees of freedom. What happens if your degrees of freedom isn’t on the table, for example df = 79? Always round DOWN to the next lowest degrees of freedom to be conservative. 5 Using this SPSS output, what would your t-curve with shaded P-value look like if you had hypotheses of: 0 : 105 : 105a H H 0 : 105 : 105a H H 0 : 105 : 105a H H You must choose your hypotheses BEFORE you examine the data. When in doubt, do a two-sided test. One-Sample Test Test Value = 105 90% Confidence Interval of the Difference t df Sig. (2-tailed) Mean Difference Lower Upper radon detector readings -.319 11 .755 -.8667 -5.739 4.005 6 How do you know when it is appropriate to use the t procedures? Very important! Always look at your data first. Histograms and Normal quantile plots (pgs. 80-83 in your book) will help you see the general shape of your data. t procedures are quite robust against non-normality of the population except in the case of outliers or strong skewness. larger samples (n) improve the accuracy of the t distribution. Some guidelines for inference on a single mean: n < 15: Use t procedures if data close to normal. If data nonnormal or if outliers are present, do not use t. 15 n 40: Use t procedures except in the presence of outliers or strong skewness. n 40: Use t procedures even if data skewed. Normal quantile plots: In SPSS, go to Graphs Q-Q. Move your variable into “variable” column and hit “OK.” 90 100 110 120 Observed Value 90 100 110 120 E x p e c te d N o rm a l V a lu e Normal Q-Q Plot of Radon Detector Reading 7 Look to see how closely the data points (dots) follow the diagonal line. The line will always be a 45-degree line. Only the data points will change. The closer they follow the line, the more normally distributed the data is. What happens if the t procedure is not appropriate? What if you have outliers or skewness with a smaller sample size (n < 40)? Outliers: Investigate the cause of the outlier(s). o Was the data recorded correctly? Is there any reason why that data might be invalid (an equipment malfunction, a person lying in their response, etc.)? If there is a good reason why that point could be disregarded, try taking it out and compare the new confidence interval or hypothesis test results to the old ones. o If you don’t have a valid reason for disregarding the outlier, you have to leave the outlier in and not use the t procedures. Skewness: o If the skewness is not too extreme, the t procedures are still appropriate if the sample size is bigger than 15. o If the skewness is extreme or if the sample size is less than 15, you can use nonparametric procedures. One type of nonparametric test is similar to the t procedures except it uses the median instead of the mean. Another possibility would be to transform the data, possibly using logarithms. A statistician should be consulted if you have data which doesn’t fit the t procedures requirements. We won’t cover nonparametric procedures or transformations for non-normal data in this course, but your book has supplementary chapters (14 and 15) on these topics online if you need them later in your own research. They are also discussed on pages 465-470 of your book. 10 Enter the pre and post training scores to SPSS. Then AnalyzeCompare MeansPaired-Samples T-test. Then input both variable names and hit the arrow key. If you need to change the confidence interval, go to “Options.” SPSS will always do the left column of data – the right column of data for the order of the difference. If this bothers you, just be careful how you enter the data into the program. Data entered as written above with pre-training in left column and post-training in right column: Paired Samples Test Paired Differences t df Sig. (2- tailed) Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Pair 1 pretraining - posttraining -1.05125 1.47417 .52120 -2.28369 .18119 -2.017 7 .084 Data entered backwards from how it is written above with post-training in left column and pre-training in right column: What’s different? What’s the same? Which one matches the way that you defined diff? Paired Samples Statistics 6.3212 8 1.82086 .64377 5.2700 8 2.01808 .71350 Post-training score Pre-training score Pair 1 Mean N Std. Deviat ion Std. Error Mean Paired Samples Test 1.05125 1.47417 .52120 -.18119 2.28369 2.017 7 .084 Post-training score - Pre-training score Pair 1 Mean Std. Deviation Std. Error Mean Lower Upper 95% Confidence Interval of the Difference Paired Differences t df Sig. (2-tailed) 11 2. 2-Sample Comparison of Means (covered in 7.2) A group of individuals is divided into 2 different experimental groups No one unit can be in both groups. Each individual receives only one treatment and/or is measured only once. Responses from each sample are independent of each other. Examples: treatment vs. control groups, male vs. female, 2 groups of different women Goal: To do a hypothesis test based on H0: A = B (same as H0: A - B = 0) Ha: A > B or Ha: A < B or Ha: A B (pick one) 2-Sample t Test Statistic is used for hypothesis testing when the standard deviations are ESTIMATED from the data (these are approximately t distributions, but not exact) 0 2 2 ( ) ~ distribution with df = min ( 1, 1)A B A B A B A B x x t t n n s s n n Confidence Interval for A - B : 2 2 * *( ) where t ~t distribution with df = min 1, 1A BA B A B A B s s x x t n n n n ***Equal sample sizes are recommended, but not required. Use the same guidelines for determining whether the t procedures are appropriate that you used for 1-sample mean and matched pairs, but use n = n1 + n2 for the sample size. 12 Example of 2-Sample Comparison of Means: A group of 15 college seniors are selected to participate in a manual dexterity skill test against a group of 20 industrial workers. Skills are assessed by scores obtained on a test taken by both groups. Conduct a hypothesis test to determine whether the industrial workers had significantly better average manual dexterity skills than the students. Descriptive statistics are listed below. Also construct a 95% confidence interval for this problem. group n x s students 15 35.12 4.31 workers 20 37.32 3.83 Example of 2-Sample Comparison of Means (Exercise 7.84): The SSHA is a psychological test designed to measure the motivation, study habits, and attitudes towards learning of college students. These factors, along with ability, are important in explaining success in school. A selective private college gives the SSHA to an SRS of both male and female first-year students. The data for the women are as follows: 154 109 137 115 152 140 154 178 101 103 126 126 137 165 165 129 200 148 Here are the scores for the men: 108 140 114 91 180 115 126 92 169 146 109 132 75 88 113 151 70 115 187 104 a) Test whether the population mean SSHA score for men is different than the population mean score for women. State your hypotheses, carry out the test using SPSS, obtain a P-value, and give your conclusions. When you enter your data into SPSS, have 2 variables: gender (type: string) and score (numeric). In the gender column, state whether a score is from a man or a woman, and in the score column, state all 38 scores. AnalyzeCompare MeansIndependent- Samples T Test. Move score into “Test Variable(s)” box. Move gender into “Grouping