Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistical Inference with t-Distributions: Confidence Intervals and Hypothesis Tests - Pr, Study notes of Data Analysis & Statistical Methods

An overview of statistical inference using t-distributions, including the calculation of confidence intervals and hypothesis tests for population means. The use of the t-distribution when the population standard deviation is unknown and the concept of degrees of freedom. Examples are given for calculating confidence intervals and hypothesis tests for population mean yield of tomatoes and vitamin c content loss in wheat soy blend. The document also discusses the appropriateness of using t-procedures for different sample sizes and the presence of outliers or skewness.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-ljp
koofers-user-ljp 🇺🇸

5

(1)

10 documents

1 / 16

Toggle sidebar

Related documents


Partial preview of the text

Download Statistical Inference with t-Distributions: Confidence Intervals and Hypothesis Tests - Pr and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity! Chapter 7 Section 7.1: Inference for the Mean of a Population Section 7.2: Comparing Two Means In Chapter 6, we knew the population standard deviation σ. • Confidence interval for the population mean μ: * Xx z n σ± • Hypothesis test statistic for the population mean μ: 00 / xz n μ σ − = • Used the distribution ~ ( , )x N n σμ . In Chapter 7, we don’t know the population standard deviation σ • Use the sample standard deviation (s) • Confidence interval for the population mean μ: * sx t n ± • Hypothesis test statistic for the population mean μ: 0 / xt s n μ− = • ~ ( 1) / x t n s n μ− − . • Sometimes you’ll see the symbol for standard error: ˆ x s n σ = 1 Using the t-distribution: • Suppose that an SRS of size n is drawn from a N(μ, σ) population. • There is a different t distribution for each sample size, so t(k) stands for the t distribution with k degrees of freedom. • Degrees of freedom = k = n – 1 = sample size – 1 • As k increases, the t distribution looks more like the normal distribution (because as n increases, s → σ ). t(k) distributions are symmetric about 0 and are bell shaped, they are just a bit wider than the normal distribution. • Table shows upper tails only, so o if t* is negative, P(t < t*) = P(t > |t*|). o if you have a 2-sided test, multiply the P(t > |t*|) by 2 to get the area in both tails. o The normal table showed lower tails only, so the t-table is backwards. Finding t* on the table: Start at the bottom line to get the right column for your confidence level, and then work up to the correct row for your degrees of freedom. What happens if your degrees of freedom isn’t on the table, for example df = 79? Always round DOWN to the next lowest degrees of freedom to be conservative. 2 Using this SPSS output, hat would your t-curve with shaded P-value look like if you had hypotheses of: 5 5 One-Sample Test Test Value = 105 90% Confidence Interval of the Difference t df Sig. (2-tailed) Mean Difference Lower Upper radon detector readings -.319 11 .755 -.8667 -5.739 4.005 w 0 : 10 : 10a H H μ μ = ≠ 0 : 10 : 10a H H 5 5 μ μ = < 0 : 10 : 10a H H 5 5 μ μ = > ses BEFORE you examine the data. When in doubt, do a You must choose your hypothe two-sided test. 5 appropriate to use the t procedures?How do you know when it is ery important! Always look at your data first. Histograms and normal quantile plots • t procedures are quite robust against non-normality of the population except in the larger samples (n) improve the accuracy of the t distribution. ome guidelines for inference on a single mean: • n < 15: Use t procedures if data close to normal. If data nonnormal or if outliers 15≤ n ≤ 40: Use t procedures except in the presence of outliers or strong n ≥ 40: Use t procedures even if data skewed. ormal quantile plots: SPSS, go to Graphs Q-Q. Move your variable into “variable” column and hit “OK.” V (pgs. 80-83 in your book) will help you see the general shape of your data. case of outliers or strong skewness. • S are present, do not use t. • skewness. • N In 90 100 110 120 Observed Value 90 100 110 120 Ex pe ct ed N or m al V al ue Normal Q-Q Plot of Radon Detector Reading 6 Look to see how closely the data points (dots) follow the diagonal line. The line will always be a 45-degree line. Only the data points will change. The closer they follow the line, the more normally distributed the data is. What happens if the t procedure is not appropriate? What if you have outliers or skewness with a smaller sample size (n < 40)? • Outliers: Investigate the cause of the outlier(s). o Was the data recorded correctly? Is there any reason why that data might be invalid (an equipment malfunction, a person lying in their response, etc.)? If there is a good reason why that point could be disregarded, try taking it out and compare the new confidence interval or hypothesis test results to the old ones. o If you don’t have a valid reason for disregarding the outlier, you have to the outlier in and not use the t procedures. • Skewness: o If the skewness is not too extreme, the t procedures are still appropriate if the sample size is bigger than 15. o If the skewness is extreme or if the sample size is less than 15, you can use nonparametric procedures. One type of nonparametric test is similar to the t procedures except it uses the median instead of the mean. Another possibility would be to transform the data, possibly using logarithms. A statistician should be consulted if you have data which doesn’t fit the t procedures requirements. We won’t cover nonparametric procedures or transformations for non-normal data in this course, but your book has supplementary chapters (14 and 15) on these topics online if you need them later in your own research. They are also discussed on pages 465-470 of your book. 7 Example of Matched Pairs: In an effort to determine whether sensitivity training for nurses would improve the quality of nursing provided at an area hospital, the following study was conducted. Eight different nurses were selected and their nursing skills were given a score from 1-10. After this initial screening, a training program was administered, and then the same nurses were rated again. Below is a table of their pre- and post-training scores, along with the difference in the score. Conduct a test to determine whether the training could on average improve the quality of nursing provided in the population. individuals Pre-training score Post- training score 1 2.56 4.54 2 3.22 5.33 3 3.45 4.32 4 5.55 7.45 5 5.63 7.00 6 7.89 9.80 7 7.66 5.33 8 6.20 6.80 a. What are your hypotheses? b. What is the test statistic? c. What is the P-value? d. What is your conclusion in terms of the story? e. What is the 95% confidence interval of the population mean difference in nursing scores? 10 Enter the pre and post training scores to SPSS. Then Analyze Compare Means Paired-Samples T-test. Then input both variable names and hit the arrow key. If you need to change the confidence interval, go to “Options.” SPSS will always do the left column of data – the right column of data for the order of the difference. If this bothers you, just be careful how you enter the data into the program. Paired Samples Statistics 6.3212 8 1.82086 .64377 5.2700 8 2.01808 .71350 Post-training score Pre-training score Pair 1 Mean N Std. Deviation Std. Error Mean Data entered as written above with pre-training in left column and post-training in right column: Paired Samples Test Paired Differences t df Sig. (2- tailed) Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Pair 1 pretraining - posttraining -1.05125 1.47417 .52120 -2.28369 .18119 -2.017 7 .084 Data entered backwards from how it is written above with post-training in left column and pre-training in right column: Paired Samples Test 1.05125 1.47417 .52120 -.18119 2.28369 2.017 7 .084 Post-training score - Pre-training score Pair 1 Mean Std. Deviation Std. Error Mean Lower Upper 95% Confidence Interval of the Difference Paired Differences t df Sig. (2-tailed) What’s different? What’s the same? Which one matches the way that you defined μdiff? 11 2. 2-Sample Comparison of Means 12 (covered in 7.2) • A group of individuals is divided into 2 different experimental groups • Each group has different individuals who may receive different treatments • Responses from each sample are independent of each other. • Examples: treatment vs. control groups, male vs. female, 2 groups of different women • Goal: To do a hypothesis test based on H0: μA = μB (same as HB 0: μA - μBB = 0) Ha: μA > μB or HB a: μA < μBB or Ha: μA ≠ μB (pick one) B • 2-Sample t Test Statistic is used for hypothesis testing when the standard deviations are ESTIMATED from the data (these are approximately t distributions, but not exact) 0 2 2 ( ) ~ distribution with df = min ( 1, 1)A B A B A B A B x xt t n s s n n − = − + n − • Confidence Interval for μA - μB : B ( ) 2 2 * *( ) where t ~t distribution with df = min 1, 1A BA B A B A B s sx x t n n n n − ± + − − ***Equal sample sizes are recommended, but not required. To summarize Chapters 6 and 7: Z vs t? Z if you know the population standard deviation. t if you only know the sample standard deviation. Matched pairs vs. 2-sample comparison of means? Matched pairs if all subjects are in one group and receive both treatments. Two-sample comparison of means if you have 2 distinct groups of subjects. Other notes: If you have a small sample, be careful! If you don’t have enough observations to do boxplots or normal quantile plots, you might have trouble looking for outliers, too. If the effect is large, you can still probably see it, but you might miss small effects. For the 2-samples degrees of freedom in a t-test, we are taking df = min (nA-1, nB-1). There is also a software approximation for the degrees of freedom that does a better job than the “minimum” way. The pooled 2-sample t procedures won’t be covered now, but if you need to compare more than 2 groups, you would need it (Chapter 12). 15 Probabilit Table entry for p and C Is ve the critleal value ¢* with probability » Lying to its night and probability C lying between —f' and tf". TABLE D ¢ distribution critical values Upper tail probability p 02 0 0005 636.6 31.60 inn in HaGaee Confidence kevel C 16